ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
253 stars 33 forks source link

Assembler benchmarking #72

Closed rcedgar closed 4 years ago

rcedgar commented 4 years ago

See notes on how to benchmark assemblers here:

200503_rce_assembler_benchmark_notes.pdf

Anyone up for taking on this task?

ababaian commented 4 years ago

@JustinChu / @taltman this will be up your alley for measuring how 'good' we can assemble new CoV.

JustinChu commented 4 years ago

Hi @rcedgar Can you point me to the pan genomes/datasets that you created for the alignment experiements?

My current protocol to evaluation that I was thinking of doing was as follows:

I'll need a pangenome with the strain being tested removed (is up to 80% what we have tested?), the reference sequence of strain and maybe libraries positive for the strain (may be able to simply simulate data instead).

Maybe you could just make clear what your folder on the s3 bucket contain so I can perhap reuse them. For instance what do the fasta files in the /r or /q directories contain?

rcedgar commented 4 years ago

"I'll need a pangenome with the strain being tested removed (is up to 80% what we have tested?)" -- yes, exactly! See benchmark notes here which explain the s3 files:

200430_covx_benchmark_howto.pdf

JustinChu commented 4 years ago

Ah, that is what I was looking for, thanks!

taltman commented 4 years ago

I think a lot of what we want to do here can be done using MetaQUAST: http://quast.sourceforge.net/metaquast

One thing that it is suboptimal in performing is in aligning the short reads back to the assemblies. Takes forever. Perhaps that is an optimization that @rcedgar would be best positioned to tackle? That is my recollection with a large metagenomics assembly from a human gut sample. We'll generate some data on how it runs with our filtered reads, and perhaps will need to address performance if it is still an issue.

rcedgar commented 4 years ago

This is my best attempt at writing a very fast read mapper:

https://drive5.com/urmap/manual/downloads.html

ababaian commented 4 years ago

Closed by #130