MGI-tech-bioinformatics / SARS-CoV-2_Multi-PCR_v1.0

SARS-CoV-2 analysis pipeline for multiplex-PCR MPS(Massive Parrallel Sequencing) data
19 stars 10 forks source link

Faulty regex terms in the data generation step #1

Closed shepp-hub closed 3 years ago

shepp-hub commented 3 years ago

Hi, In the first automatically-generated script (step0.GenerateData.sh), the sequencing files are selected through the following regular expression: ln -s /datasets/covid/VID/*1_1.fq.gz /datasets/covid/VID/result/VXXXXXX/01.Clean/Raw_VXXXXX_L0X_1_1_1.fq.gz This is problematic, since other sequencing files with identical suffixes, such as VXXXXX_L0X_71_71_1.fq.gz or VXXXXX_L0X_21_21_1.fq.gz, will also be included in such a statement. I suggest using a safer regular expression, to avoid potential issues.

meizhiying commented 3 years ago

Hi, Thanks for your suggestion. In this situation, you can change the second column of the sample.list from '1_1' to 'VXXXXX_L0X_1_1', slide+lane+bc will be safer.

MGI-tech-bioinformatics commented 3 years ago

issue closed