Open duceppemo opened 2 years ago
Thanks for checking out vsnp3 and sending issues seen.
I've had inconsistent results with vcftools and bcftools. I typically see bcftools installed via the freebayes requirement so have left it out from explicit requirement list. Same with vcflib for vcftools. I've fought with conda installing bcftools as a Python 2 tool when asking for Python 3 when specifying the install explicitly. I've had best results leaving them out of the explicit requirements and letting them be installed as requirements of freebayes and vcflib. Same with the libcrypto (and other libraries). Other than having comments like this here to help other users, I am convinced that because everyone's environment is slightly different conda may require troubleshooting to either "fix" a user's environment or to fix something being overlooked by conda. That being said I should look at replacing these tools since they're often problematic. I did this for pysam/samtools. These tools would often (but not always) cause conflicting libraries, so pysam was removed from vsnp3. I will be working soon to provide vsnp3 as a container. Hopefully this will ease installation, or at least provide another option.
Nanopore is beta at best. Especially since the technology is steadily changing. Can you share the FASTQ file you're using? If so I would like to troubleshoot.
Sourmash runs quick and I like seeing the "best reference" even when specifying. I should change my wording so there isn't confusion. It should still be using the reference you specified. I'm going to update the wording.
I would like to improve Nanopore support. This has been a first test at seeing how it may work, but the datasets tried so far have been few. This input is good to get.
Hi Tod,
I've been trying the nanopore support of vSNP3 and I think it still needs to be optimized.
First, when installing vSNP3 with conda, it lacks 2 dependencies: vcftools and bcftools. To get a fully working pipeline (I only tested step 1 so far), I had to run:
There are a few Warnings printed in the terminal while the step1 runs. The command I used:
The terminal output:
As you can notice, the top reference has a very low % value. It still picks the right one, but this part of the pipeline is not optimized for Nanopore. Also, why is it still looking for the best reference is we already told which one to use?
The log file looks like this:
The main issue right now is that the mpileup step (using bcftools) takes about 5h per sample. I just can rerun all my samples with vSNP3 if it takes that long!
Here's the content of the Excel stats file:
So any plans on improving support for Nanopore? I actually haven't tested vSNP3 on paired end data yet, so I don't know if the speed problem is only Nanopore related or not. Let me know if you need more info.
Thanks! Marco