RasmussenLab / phamb

Downstream processing of VAMB binning for Viral Elucidation
MIT License
44 stars 8 forks source link

Random Forest Feature Names #33

Closed carrgomoo closed 2 years ago

carrgomoo commented 2 years ago

Hello developers! Thank you very much for putting this tool out into the world!

I ran the random forest model with the new recommended phamb dependencies like this:

python mag_annotation/scripts/run_RF.py ../contgs.fna clusters.tsv annotations resultdir

and was given this error messsage

"/home/user/miniconda3/envs/phamb/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names"

the binning output was still produced, but I'm wondering if the model ran correctly? Why might this error be occurring?

Thank you very much! Carrie

joacjo commented 2 years ago

Hi Carrie

Thanks for your interest in the method! It's only a warning and should not affect the output but thanks for letting me know, I will look into the code and push and update so users will not see this warning. The reason for the warning is that the RF model was trained with named columns on a pandas dataframe and it looks for them when you run the RF prediction on an unnamed dataframe, but I have hardcoded the order of the columns so they match the model and does not affect model performance.

Thank you!

Best, Joachim