bioboxes / rfc

Request for comments on interchangeable bioinformatics containers
http://bioboxes.org
MIT License
40 stars 9 forks source link

The bioboxes validator task should ensure a metric TSV file is generated #205

Open michaelbarton opened 7 years ago

michaelbarton commented 7 years ago

When bioboxes/rfc#204 in completed, the bioboxes file validator should check that the mandatory metrics file is produced by the assembly validator.

michaelbarton commented 7 years ago

@pbelmann

evaluate a genome assembly in FASTA format using optional multiple reference genome sequences in FASTA format

Do you use the reference assembly biobox without a reference? I forgot that we originally defined this as being optional. The GAET biobox requires a reference as it compares the two sets of genome annotations.

michaelbarton commented 7 years ago

I also noticed that we define contig and scaffold as the possible options of the fasta value. Would the software act differently depending on what they were? This is a trivial point, however I think it's generally good to simplify the RFCs where ever possible.

pbelmann commented 7 years ago

@pbelmann

evaluate a genome assembly in FASTA format using optional multiple reference genome sequences in FASTA format Do you use the reference assembly biobox without a reference? I forgot that we originally defined this as >being optional. The GAET biobox requires a reference as it compares the two sets of genome annotations.

Yes and I think we should leave it optional for evaluating assemblies where you don't have a reference.

I also noticed that we define contig and scaffold as the possible options of the fasta value. Would the software act differently depending on what they were? This is a trivial point, however I think it's generally good to simplify the RFCs where ever possible.

Well, the idea was to define the input according to the short read assembly output definition. But I think we are not using it, so I would say we can remove this and maybe also in the output short read assembler interface.

michaelbarton commented 7 years ago

Yes and I think we should leave it optional for evaluating assemblies where you don't have a reference.

I think is fine for QUAST but for GAET, it cannot run without a reference. Some tools might be able to generate metrics without a reference, but others will need it.

michaelbarton commented 7 years ago

Well, the idea was to define the input according to the short read assembly output definition. But I > think we are not using it, so I would say we can remove this and maybe also in the output short read assembler interface.

I agree. Going further it might be useful to have a list of terms we use, and what they specifically mean.

pbelmann commented 7 years ago

Yes and I think we should leave it optional for evaluating assemblies where you don't have a reference.

I think is fine for QUAST but for GAET, it cannot run without a reference. Some tools might be able to generate metrics without a reference, but others will need it.

We could make the reference in the reference based interface mandatory and introduce a third, reference-free interface. Quast could implement both, GAET just the reference based one.