faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

Yet another illumiprocessor name error #308

Closed louisfnastasi closed 8 months ago

louisfnastasi commented 11 months ago

Hi all,

I've seen the other threads on this issue (e.g., https://github.com/faircloth-lab/phyluce/issues/96 and https://github.com/faircloth-lab/phyluce/issues/208) but haven't found a suitable solution - sorry for yet another question like these!

I've been running the following:

illumiprocessor \ --input raw-fastq/ \ --output clean-fastq \ --config trim_testconfig.conf \ --cores 12 \ --r1-pattern "{}R1\d+.fastq.gz" \ --r2-pattern "{}R2_\d+.fastq.gz"

Our file names are formatted like so: RAPiD-Genomics_F300-F301_PST_174201_P001_WA01_i5-538_i7-97_S6053_L003_R1_001.fastq.gz RAPiD-Genomics_F300-F301_PST_174201_P001_WA01_i5-538_i7-97_S6053_L003_R2_001.fastq.gz

and just a brief example of the configuration file for the sample listed above:

[tag map] RAPiD-Genomics_F300-F301_PST_174201_P001_WA01_i5-538_i7-97_S6053L003:i5-plate-1,i7-WD11

[names] RAPiD-Genomics_F300-F301_PST_174201_P001_WA01_i5-538_i7-97_S6053L003:Amphibolips_quercusjuglans_CYNOG0048

The error output I'm receiving is identical to those previously posted: 2023-07-27 15:24:50,049 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Version: 2.10 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --config: trim_test_config.conf 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --cores: 12 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --input: /storage/group/hmh19/default/trim_test/raw-fastq/raw-fastq 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --log_path: None 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --min_len: 40 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --no_merge: False 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --output: /storage/group/hmh19/default/trim_test/raw-fastq/clean-fastq 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --phred: phred33 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --r1pattern: {}R1\d+.fastq.gz 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --r2pattern: {}R2\d+.fastq.gz 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --se: False 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --trimmomatic: /storage/home/lfn5093/.conda/envs/illumi_env/bin/trimmomatic 2023-07-27 15:24:50,049 - illumiprocessor - INFO - Argument --verbosity: INFO Traceback (most recent call last): File "/storage/home/lfn5093/.conda/envs/illumi_env/bin/illumiprocessor", line 17, in sys.exit(main()) File "/storage/home/lfn5093/.conda/envs/illumi_env/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/storage/home/lfn5093/.conda/envs/illumi_env/lib/python3.6/site-packages/illumiprocessor/main.py", line 34, in main reads.append(core.SequenceData(args, conf, start_name, end_name)) File "/storage/home/lfn5093/.conda/envs/illumi_env/lib/python3.6/site-packages/illumiprocessor/core.py", line 85, in init self._get_read_data() File "/storage/home/lfn5093/.conda/envs/illumi_env/lib/python3.6/site-packages/illumiprocessor/core.py", line 106, in _get_read_data "errors in your conf file.".format(self.start_name) OSError: There is a problem with the read names for RAPiD-Genomics_F300-F301_PST_174201_P001_WA01_i5-538_i7-97_S6053L003. Ensure you do not have spelling/capitalization errors in your conf file. /var/spool/slurm/d/job4761542/slurm_script: line 42: phyluce_assembly_get_fastq_lengths: command not found

I've already tried numerous expressions for the r1 and r2 patterns including the following but none seem to work: {}R1\d+.fastq.gz {}R1\d+.fastq.gz {}R1_\d+.fastq(?:.gz) {}R1_001.fastq.gz {}R1_001.fastq.gz {}R1_001.fastq(?:.gz) {}R1001.fastq.gz {}R1\w+.fastq.gz {}R1\w+.fastq.gz* {}R1\w+.fastq(?:.gz)*

I'd greatly appreciate any help anyone can offer!

brantfaircloth commented 11 months ago

Things run fine for me with dummy data, a config file (test-2.conf) that looks like this:

[adapters]
i7:CTGTCTCTTATACACATCTCCGAGCCCACGAGAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:CTGTCTCTTATACACATCTGACGCTGCCGACGA*GTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences]
i5-538:TGAGTCAG
i7-97:GACGTGAC

[tag map]
RAPiD-Genomics_F300-F301_PST_174201_P001_WA01_i5-538_i7-97_S6053_L003:i5-538,i7-97

[names]
RAPiD-Genomics_F300-F301_PST_174201_P001_WA01_i5-538_i7-97_S6053_L003:Amphibolips_quercusjuglans_CYNOG0048

and a command for illumiprocessor that looks like this:

illumiprocessor --input raw-data-2 --output clean-data-2 --config test-2.conf --r1-pattern "{}_R1_\d+.fastq.gz" --r2-pattern "{}_R2_\d+.fastq.gz"
brantfaircloth commented 11 months ago

Oops - hang on a second...

brantfaircloth commented 11 months ago

Ok - edited first reply to add brackets. That still seems to work A-ok.