Closed oneillkza closed 3 years ago
Hi, I have not tested on Biocontainers docker container via Singularity before. There might be a memory problem. How much memory did you allocate for this job?
I was running this on one of our "general purpose" hosts that has 1.5TB of RAM. There were other users running jobs, but there should have been at least 700-800GB free.
Is there some minimal example data I can use to test that the container is working correctly?
Currently, there is no test set, but I'm working on it for future versions. Is this problem still persisting?
Hi @cytham!
Yes, unfortunately this problem is still persisting. If you want to try to debug it, I have a minimal-ish bam you can run:
https://www.bcgsc.ca/downloads/koneill/nanovar_test/lra_1_1_2_running_id.sorted.bam https://www.bcgsc.ca/downloads/koneill/nanovar_test/lra_1_1_2_running_id.sorted.bam.bai
It's the first 10Mb of chr20, from some NA24385 data we sequenced internally.
Hi @oneillkza, sorry for the wait. I have released a new version (v1.3.9). I suggest to install the new version in a new environment and try running it again.
@zhemingfan Hi, did you post an issue?
Hi @cytham I did - I wanted to retry something, but it didn't end up working. I'm not sure if this is something on pysam's end - but I was getting
Traceback (most recent call last):
File "/home/jfan/miniconda3/bin/nanovar", line 494, in <module>
main()
File "/home/jfan/miniconda3/bin/nanovar", line 301, in main
run.bam_parse_detect()
File "/home/jfan/miniconda3/lib/python3.8/site-packages/nanovar/nv_characterize.py", line 76, in bam_parse_detect
= bam_parse(self.bam, self.minlen, self.splitpct, self.minalign, self.dir, self.filter, self.contig_omit)
File "nanovar/nv_bam_parser.pyx", line 67, in nanovar.nv_bam_parser.bam_parse
File "pysam/libcalignedsegment.pyx", line 2399, in pysam.libcalignedsegment.AlignedSegment.get_tag
File "pysam/libcalignedsegment.pyx", line 2438, in pysam.libcalignedsegment.AlignedSegment.get_tag
KeyError: "tag 'AS' not present"
After converting the BAM file to a SAM, adding an AS tag to the SAM file, then re-converting it to a BAM file, the error persists. Do you have any suggestions?
Hi @zhemingfan, may I know how the bam file was generated? Was it through NanoVar? Or did you align it with your own aligner?
@cytham we're using minimap2, but would likely also be interested in trying NanoVar with LRA.
Could I please ask what you mean by "through NanoVar"? I was under the impression (from the readme) that NanoVar took an aligned bam as input.
Ok, minimap2 should be fine. Not sure if LRA will be compatible though.
NanoVar can also take FASTA/FASTQ files as input, which it will first do alignment with minimap2.
Hi @cytham , thank you for all your help! I was able to get Nanovar working with minimap2
, but after re-doing the alignment, LRA is still giving the same KeyError.
i believe LRA is not compatible with NanoVar. It seems like it is lacking the 'AS' tag in its output SAM file after alignment. Is there any parameters to add the 'AS' tag in LRA output?
When you said you've added the 'AS' tag manually, did you also add the alignment score that comes with the tag? Because this is required in NanoVar.
Yeah LRA puts the alignment score in a custom tag "NV" rather than in "AS" as is customary. It looks like their score is also a float rather than an int, which may be why they chose to use a custom tag?
Looking at the values themselves, they seem to be more-or-less in the same range as minimap2 AS values, although maybe about two-fold smaller. Also the floating point seem to have about 4-6 significant digits to their left, so I doubt there'd be any loss of precision by having it be an integer.
One potential solution would be to generate AS tags for an LRA bam from the NV tag but rounded to the nearest integer.
@oneillkza thanks for looking into this matter and raising it up to the authors of LRA. Yes I believe replacing NV with AS might work, though we still have to test its readability of Floats by Pysam, which is what NanoVar uses to read the SAM file. I would hope that the authors of LRA would change its tag labels to follow the current SAM format in order for its SAM files to be readable by existing packages such as pysam.
I'm running from the Biocontainers docker container via Singularity. The data set is a GM24385 test set we sequenced on a PromethION flowcell. I subsambled the first 10Mbp of chr20 for this test. It gets to the "neural network inference" and then segmentation faults.