NPLinker / nplinker

A python framework for data mining microbial natural products by integrating genomics and metabolomics data
https://nplinker.github.io/nplinker
Apache License 2.0
17 stars 13 forks source link

Exception: Failed to find *ANY* strains, missing strain_mappings.csv? #170

Closed wkipandula closed 9 months ago

wkipandula commented 1 year ago

Hi there, Can someone please take a look at my strain_mappings file attached if I am missing something?I keep getting the "No strains" found or the "Failed to find ANY strains, missing strain_mappings.csv" errors. Please help.

strain_mappings.csv

justinjjvanderhooft commented 1 year ago

@CunliangGeng - are you aware of any recent changes in the expected file format? @wkipandula, did you find an example mapping file?

CunliangGeng commented 1 year ago

Hi @wkipandula,

If you did not upload your data to PODP, then you will also have to add MS files to the strain_mapping.csv. For example, if "10a.mzXML" and "10b.mzXML" have the strain label "MIAP_547A_A", then you need to append them to the first line (MIAP_547A_A) of your strain_mapping.csv.

wkipandula commented 1 year ago

I am using the webapp version.See the unknown_strains_gen.csv and unknown strains_met.csv.I am thinking that the problem is in parsing the gene/.bgk files labels. unknown_strains_gen.csv unknown_strains_met.csv common_strains.csv strain_mappings.csv

CunliangGeng commented 1 year ago

OK, I assume that you're using version 1.2.0.

The unknown_strains_gen shows that BGC names from BigScape clustering file are not recognised. You could debug in two ways:

  1. make sure your .gbk files are organised properly like the structure below:

    antismash
    ├── GCF_000016425.1
    │   ├── NC_009380.1.region001.gbk
    │   ├── NC_009380.1.region002.gbk
    │   └── NC_009380.1.region003.gbk
    ├── GCF_000018265.1
    │   ├── NC_009953.1.region001.gbk
    │   ├── NC_009953.1.region002.gbk
    │   ├── NC_009953.1.region003.gbk
    ...
  2. if the first solution does not work, the name of .gbk files might also matter. For example MIAP_547A_C_00023_NODE_23...region001 should be changed to MIAP_547A_C_00023_NODE_23.region001 by keeping only one single dot.