Open michaelbarton opened 7 years ago
@pbelmann
evaluate a genome assembly in FASTA format using optional multiple reference genome sequences in FASTA format
Do you use the reference assembly biobox without a reference? I forgot that we originally defined this as being optional. The GAET biobox requires a reference as it compares the two sets of genome annotations.
I also noticed that we define contig
and scaffold
as the possible options of the fasta value. Would the software act differently depending on what they were? This is a trivial point, however I think it's generally good to simplify the RFCs where ever possible.
@pbelmann
evaluate a genome assembly in FASTA format using optional multiple reference genome sequences in FASTA format Do you use the reference assembly biobox without a reference? I forgot that we originally defined this as >being optional. The GAET biobox requires a reference as it compares the two sets of genome annotations.
Yes and I think we should leave it optional for evaluating assemblies where you don't have a reference.
I also noticed that we define contig and scaffold as the possible options of the fasta value. Would the software act differently depending on what they were? This is a trivial point, however I think it's generally good to simplify the RFCs where ever possible.
Well, the idea was to define the input according to the short read assembly output definition. But I think we are not using it, so I would say we can remove this and maybe also in the output short read assembler interface.
Yes and I think we should leave it optional for evaluating assemblies where you don't have a reference.
I think is fine for QUAST but for GAET, it cannot run without a reference. Some tools might be able to generate metrics without a reference, but others will need it.
Well, the idea was to define the input according to the short read assembly output definition. But I > think we are not using it, so I would say we can remove this and maybe also in the output short read assembler interface.
I agree. Going further it might be useful to have a list of terms we use, and what they specifically mean.
Yes and I think we should leave it optional for evaluating assemblies where you don't have a reference.
I think is fine for QUAST but for GAET, it cannot run without a reference. Some tools might be able to generate metrics without a reference, but others will need it.
We could make the reference in the reference based interface mandatory and introduce a third, reference-free interface. Quast could implement both, GAET just the reference based one.
When bioboxes/rfc#204 in completed, the bioboxes file validator should check that the mandatory metrics file is produced by the assembly validator.