franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
189 stars 41 forks source link

error in organizing dataset folder into sample-specific folders #83

Closed zoey-rw closed 2 years ago

zoey-rw commented 3 years ago

The organizeData rule doesn't work on my samples because they have underscores throughout the file names (e.g. "HARV_001-O-20180710-COMP-DNA1_R1.fastq.gz"). A simple fix would be to split the sample names at the last underscore instead of the first underscore.

Original:

# Create list of unique sample IDs
        for file in *.gz; do 
            echo $file; 
        done | sed 's/_.*$//g' | sed 's/.fastq.gz//g' | uniq > ID_samples.txt
        echo -e " done.\n $(less ID_samples.txt|wc -l) samples identified.\n"

Fixes the problem:

# Create list of unique sample IDs
        for file in *.gz; do 
            echo $file; 
        done | sed 's/_[^_]*$//g' | sed 's/.fastq.gz//g' | uniq > ID_samples.txt
        echo -e " done.\n $(less ID_samples.txt|wc -l) samples identified.\n"
franciscozorrilla commented 2 years ago

Hi Zoey,

Thank you very much for your suggested improvement, will implement it as soon as I am back from vacation!

Best wishes, Francisco

p.s. Just FYI if you want to show up as a contributor to this repo you can create a PR by clicking on the Snakefile, then the little pen icon on the top right, which should make the file editable for you. Once you make your changes you can add a small description of what you have done and then submit at the bottom of the page.

franciscozorrilla commented 2 years ago

Hi Zoey, your suggestion has been implemented in the latest commit (https://github.com/franciscozorrilla/metaGEM/commit/87100d170589713a6dff90f8597bf540e545d97d). Really sorry if it caused problems with your data!

Best, Francisco