LottePronk / whokaryote

Classify metagenomic contigs as eukaryotic or prokaryotic
GNU Affero General Public License v3.0
29 stars 7 forks source link

Handling empty feature table #11

Open TimothyStephens opened 1 month ago

TimothyStephens commented 1 month ago

Hi,

Thank you for your work on whokaryote.

I have encountered a bug when whokaryote is run on very small MAGs without any valid features identified. A bit of an edge case I know. The error arrises from predict_class.py: line 90 The features DataFrame is empty, which causes a ValueError to be returned by predictions = loaded_rf.predict(features). ValueError: Found array with 0 sample(s) (shape=(0, 9)) while a minimum of 1 is required.

A work around for this problem is to replace line 90 with the following.

    predictions = []
    if not features.empty:
        predictions = loaded_rf.predict(features)

I believe that it should preserve the normal behavior of whokaryote.

Thanks, Tim.

LottePronk commented 4 weeks ago

Hi Tim,

Thank you for using Whokaryote and for taking the time to look into this error.

I will look into the solution and implement it when I have time.

Just some things to keep in mind: If the features dataframe is empty, whokaryote cannot make any predictions. Tiara should still be working though, and you can check the Tiara predictions in the featuretable output file.

I'm always curious about the use cases people are using Whokaryote for, as it may be useful to expand its functionality in the future. If I may ask, for what purpose are you running it on MAGs?

Kind regards, Lotte

TimothyStephens commented 5 days ago

Hi Lotte,

Thanks for letting me know about Tiara.

My current use case is as part of a snakemake workflow for assembling MAGs from all domains of life (prokaryotes, eukaryotes, and viruses). Because of the possible range of genome sizes which covers all of these domains (viruses having potentially tiny genomes), some of the MAGs being considered are quite small. Which is how I ran into this error.

Cheers, Tim.