BioDataFuse / pyBiodatafuse

Python package for biodatafuse project.
MIT License
3 stars 7 forks source link

Mismatch when combining sources. #107

Open adriaque opened 6 months ago

adriaque commented 6 months ago

I think I found an issue possibly caused by the utils.combine_sources function that is causing a mismatch for some of the annotators when combining them into a single table. The problem is that not all annotator data frames have the same length and when combining them I just found that I have a mismatch in the MINERVA and the WikiPathways annotators. Casually, those annotators for generating the input data are using this code data_df = get_identifier_of_interest(bridgedb_df, "NCBI Gene") instead of this data_df = get_identifier_of_interest(bridgedb_df, "Ensembl") that for example, OpenTargets annotator is using. As the source of the identifiers is different and the mapping is not perfect the length of the data frame changes depending on which source ID you are using and that causes a mismatch when merging the data frames.

@tabbassidaloii @YojanaGadiya

YojanaGadiya commented 6 months ago

Thank you @adriaque for rasing this. @tabbassidaloii is now taking care of this in a dedicated PR.

tabbassidaloii commented 6 months ago

I couldn't find a bug in the scripts @adriaque can you please share the screanshot of the table or the input so I can reproduce the error?