Closed SEVEN-XYCHEN closed 2 months ago
hi @SEVEN-XYCHEN , apology for the delay in my reply. We do not have any control over the development of nanopolish but that being said, m6Anet indexes the nanopolish eventalign.txt file before preprocessing it, accessing only the relevant part of the files at each time to prevent memory overflow. So far I've succeeded running m6anet dataprep on 1 TB of eventalign.txt but let me know if you encounter any difficulty with it. Usually I don't keep the eventalign.txt file for too long after preprocessing
Hi, @chrishendra93 Thank you very much for your reply, it has been very helpful to me. I will continue to use m6Anet for analysis, which is a great software. Best, Chen
Hello, @chrishendra93!
I have the same issue, but i want understand how much memory i need for this file. Maybe you have this information or "formula" how count it? My eventalign.txt file is already 2TB before it's finished running.
characteristics of my file: reads-ref.sorted.bam 3.5 gb reads.fastq 5.7 gb fasta5 files 169 gb
hi @VikArz02 , I cannot really tell as it really depends on nanopolish eventalign ability to segment the raw files. If you have high-quality fast5 files, then it will be able to resolve most of the segments and your eventalign.txt file might take up a bit more storage space. This will not affect m6Anet memory requirements, but however it will affect its running time since you'll have more sites to process. This was raised in #128 as well so that m6Anet can be run from compressed nanopolish eventalign file, which we might explore in future release
Thanks!
Hi @chrishendra93, m6Anet has been a nice tool for us to use lately, but the datasets are getting bigger and bigger and file management is starting to become a problem with the eventalign step. Is there any chance m6anet dataprep could take the output from eventalign through a pipe? That way we never have to write the eventalign data to disk? This is something that both yanocomp and nanocompore do and it really helps with space management because the processed eventalign files tend to be one tenth the size of the raw eventalign file. Thanks for continuing active development with m6Anet and I'm looking forward to seeing what else is done with it!
Hi @lmulroney thanks for your comment! We hope to improve the file handling in a future version, but we don't have a release time line yet. We will post an update here
Hello, I encountered a memory issue. My fastq file is 5.9G,bam file is 9.3G, transcript.fa file is 1.8 G, and there are 453G fast5 files under the fast5 folder. I first ran nanopolish index, it ran successfully. Then, I would run eventalign, but the generated eventalign.txt file is too large. It's already 835G before it's finished running. Is this reasonable? Is the generated intermediate file too large? Is there any way to improve it? Thank you very much for taking the time to answer my question! Looking forward to your reply.