Carrion-lab / bacLIFE

23 stars 3 forks source link

both mapping_file.txt and mapping_file_augmented.txt showed that the lifestyle of all strains was unknown #9

Closed wsyjh closed 2 months ago

wsyjh commented 2 months ago

I ran BacLIFE using the genomes of 257 Pseudomonas type strains, but both mapping_file.txt and mapping_file_augmented.txt showed that the lifestyle of all strains was unknown, and there were no other errors reported in the process. Any ideas?

gguerr001 commented 2 months ago

The mapping_file.txt created after running bacLIFE clustering module is not annotated (all strains are annotated as Unknown). The user should fill the "Lifestyle" column with their metadata of interest (i.e plant_pathogen, animal_pathogen, environmental...). Strains that you don't know the metadata should be left as "Unknown" in this column. The lifestyle prediction module will learn the metadata of the strains that you annotated in "mapping_file.txt" and will try to predict the ones left as "Unknown". The file "mapping_file_augmented.txt" will be created after running the lifestyle prediction module and will annotate the strains you left as "Unknown" if is confident enough.