Closed jakewendt closed 3 years ago
Shasta default assembly parameters are not generally optimal for any particular situation. Assembly parameters generally need to be tuned to the data being used, and for this reason we provide a few sample configuration files in directory shasta/conf
. I suggest that you start with configuration file shasta/conf/Nanopore-Sep2020.conf
and then optimize from there. Use command line option --config
to specify the configuration file. You can download the configuration files from GitHub or get them from the tar file for a release.
If you try that and you still don't get a satisfactory assembly, please post the entire assembly log output plus file AssemblySummary.html
from the assembly directory and I can help. It would also help if you can post your entire input file, perhaps in compressed form, if you don't mind doing that.
From the information you gave me so far I can't tell what is going on, but given that the MinHash step finds nothing it is possible that there may be a problem in the code that generates the simulated reads. But I will be able to better assess if you provide the information I mentioned above.
Testing now. Thank you.
I am closing this due to lack of discussion, but feel free to reopen it or create a new one if new discussion items emerge.
I am developing a script to simulate reads by taking a small (5mb) section of hg38 and randomly selecting regions in length from 1000bp to 15,000bp. Eventually I was going to add random error to these reads, however I wanted to try to reassemble these reads first for comparison. I had hoped that the result would be the initially selected sequence.
I selected 20,000 reads generating a 160mb nanopore.exact.fasta
If I run it simply with ...
... it completes quite quickly but it discards many reads as being less than the minimum of 10,000bp. It creates an assembly file containing 39 contigs which is less than desirable.
I changed
--Reads.minReadLength
to 1000, but then everything seems to get dumped at or before the LowHash step.I thought that I'd just create a better data set since I had that luxury and so I created a fasta file with 5,000 reads ranging from 10,000 to 100,000bp. Running
shasta
on this data set ...... also seems to ignore all reads at the LowHash step.
Does anyone have any suggestions?