EBI-Metagenomics / EukCC

Tool to estimate genome quality of microbial eukaryotes
GNU General Public License v3.0
35 stars 9 forks source link

Ability to choose which node to use for completeness estimation #15

Closed jamiemcg closed 3 years ago

jamiemcg commented 3 years ago

Hello,

Is it possible to manually choose which node/lineage of marker genes to use for completeness and contamination estimates?

Generally the node automatically selected seems to be a good choice most of the time. But sometimes, for obscure genomes, it can be quite off. It would be useful if we could choose/force which set of markers is used for assessment.

Thanks!

openpaul commented 3 years ago

Thanks for this feature request. Thats definitely a feature I have on the list for the next version.

Do you have any input on how you want to define nodes? By node name or do you want to give related taxids and define the node via this?

Do you think users know which node they want to use?

jamiemcg commented 3 years ago

Thanks @openpaul!

I think I would find this most useful as an optional parameter to use if the automatically selected lineages didn't look right. This would allow different assemblies to be directly compared (i.e. guaranteeing they are assessed using the same node).

I guess choosing node names make sense.

openpaul commented 3 years ago

This should now be possible in version 0.3

Simply add the argument

--node node121,node120

Using a comma separated list of nodes (or just one node). Let me know if that works for you

jamiemcg commented 3 years ago

@openpaul great thanks, works as expected.