AlexanderKroll / ESP

MIT License
67 stars 23 forks source link

Generation of 'KEGG_drugs_df.pkl' and 'KEGG_substrate_df.pkl' #18

Closed fmoorhof closed 7 months ago

fmoorhof commented 8 months ago

Hi Alex & community, thank you for your great work and also assisting in so many of the issues.

I am stuck on the execution of the notebook: 1_0 - Creating enzyme-substrate database from GOA database.ipynb where the 2 pickles (KEGG_drugs_df.pkl and KEGG_substrate_df.pkl) are supposed to be read. How can I generate these files? When i search in the entire repository for their generation I only get the pd.read_pickle(... answers.

Thanks for any hints Best regards, Felix

AlexanderKroll commented 7 months ago

Hi Felix,

Those files were generated by extarcting data from KEGG and the generation process is not part of this repository. Is there a particular reason why you need to reproduce those files instead of using the existing ones?

Best, Alex

fmoorhof commented 7 months ago

Hi Alex,

thanks for your answer. I was assuming an export and format conversion from KEGG already. The reason was I was not aware of things like git lfs that you need to use separately to pull all the pickle files from remote (see issue #8 ) in order to be able to read the pickle files. The solution in issue #8 is pointing to python and package inconsistencies instead. Hence, i tried to generate all pickles by myself from the code which then unfortunately failed because of these 2 'missing files' (missing because they only show the pointer to remote without content). However, by using git lfs pull all works fine and you can actually read all pickles :)