NPLinker / nplinker

A python framework for data mining microbial natural products by integrating genomics and metabolomics data
https://nplinker.github.io/nplinker
Apache License 2.0
17 stars 13 forks source link

how is `strain_mappings.csv` generated? #148

Closed CunliangGeng closed 1 year ago

CunliangGeng commented 1 year ago
justinjjvanderhooft commented 1 year ago

@CunliangGeng the strain mapping file is normally manually provided by the user - it is the key information that links the genomics data to the metabolomics information. Only when downloaded from the PoDP, these connections are automatically loaded into NPLinker. Of course, this step in the process is tricky as the user may not completely get the format of the mapping file correct. Do you have any suggestions on how to improve this step and make it less "error-prone" and thus more "robust"?

CunliangGeng commented 1 year ago

@justinjjvanderhooft This step is indeed a pain point. I think we could take the following measures to improve it:

justinjjvanderhooft commented 1 year ago

Thanks for the suggestions. I agree that PoDP is a great entry point, but in practice many users will start from local files - and possibly already run BiG-SCAPE results and/or Molecular Networking runs. The GUI tool sounds like a great suggestion - how much work would that be? It may be a nice aim for an intern?

CunliangGeng commented 1 year ago

It cost more than half a year in total for experienced engineers to develop cffinit (see the dev history plot). So I guess the GUI tool would require similar amount of effort. I think it's a very good internship project.

justinjjvanderhooft commented 1 year ago

Wow, that is quite an effort indeed. Something to consider - if there is an intern interested, please do encourage to take up this challenge - at least we could make a start with it.... We could re-use bits and pieces of the PoDP add form, as in one of the steps, we basically create the mapping file from previously generated information and direct links to the publicly available metabolomics datafiles....

CunliangGeng commented 1 year ago

This tool should not be run in a browser, as browser will restrict the tool from detecting files on the user's machine. So I don't think we could reuse PODP code (web app running in browser). The tool is better to be a desktop application with graphical user interface.