faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
77 stars 49 forks source link

Issue with config file / names #347

Open jb23590 opened 2 weeks ago

jb23590 commented 2 weeks ago

Dear Faircloth Lab,

I hope someone may be able to lend some assistance to an issue that I have been unable to resolve for some time. I am trying to run the simple processing of some raw test data for three samples. Every time I get the following issue:

File "/usr/local/miniconda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 106, in _get_read_data "errors in your conf file.".format(self.start_name) OSError: There is a problem with the read names for BFC_296_TTCGCCAT. Ensure you do not have spelling/capitalization errors in your conf file.

This seems to be only for sample BFC_296, however it is ordered as sample 1 so I am unsure it further analysis would produce error in the subsequent two samples.

The entire code prior to error is here:

--

[jb23590@ceres ~]$ qrsh Warning: Permanently added '[compute-3-18.local]:38743' (ECDSA) to the list of known hosts. Last login: Fri Sep 6 12:10:52 2024 from ceres.local Rocks Compute Node Rocks 7.0 (Manzanita) Profile built 11:59 15-Feb-2024

Kickstarted 14:10 15-Feb-2024 [jb23590@compute-3-18 ~]$ cd PoritesTest/PoritesTestWorking/ [jb23590@compute-3-18 PoritesTestWorking]$ ls BFC19_263_CCAAGTAG_R1.fastq.gz BFC_296_TTCGCCAT_R1.fastq.gz BHPO_001_CCTAGAGA_R1.fastq.gz PoritesTestReads.o2313183 BFC19_263_CCAAGTAG_R2.fastq.gz BFC_296_TTCGCCAT_R2.fastq.gz BHPO_001_CCTAGAGA_R2.fastq.gz PoritesTestReads.sh [jb23590@compute-3-18 PoritesTestWorking]$ cd /home/jb23590/PoritesTest/ [jb23590@compute-3-18 PoritesTest]$ cat -> PoritesTestConfig.conf

[adapters] i7: GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCTCGTATGCCGTCTTCTGCTTG i5: AATGATACGGCGACCACCGAGATCTACACACACTCTTTCCCTACACGACGCTCTTCCGATCT

[tag sequences] i7-P701:TTCGCCAT i7-P702:CCAAGTAG i7-P703:CCTAGAGA i5-P501:TTCGTACG i5-P502:CACAGGAA i5-P503:GTCCTAAG

[tag map] BFC_296_TTCGCCAT:i7-P701,i5-P501 BFC19_263_CCAAGTAG:i7-P702,i5-P502 BHPO_001_CCTAGAGA:i7-P703,i5-P503

[names] BFC_296_TTCGCCAT:BFC_296_IO BFC19_263_CCAAGTAG:BFC19_263_IO BHPO_001_CCTAGAGA:BHPO_001_BH

[jb23590@compute-3-18 PoritesTest]$ source /usr/local/miniconda/bin/activate phyluce-1.7.1 (phyluce-1.7.1) [jb23590@compute-3-18 PoritesTest]$ illumiprocessor \

--input $PWD/PoritesTestWorking/ \
--output PoritesTestClean \
--config PoritesTestConfig.conf \
--cores 3

2024-09-06 12:20:37,954 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2024-09-06 12:20:37,955 - illumiprocessor - INFO - Version: 2.10 2024-09-06 12:20:37,955 - illumiprocessor - INFO - Argument --config: PoritesTestConfig.conf 2024-09-06 12:20:37,955 - illumiprocessor - INFO - Argument --cores: 3 2024-09-06 12:20:37,955 - illumiprocessor - INFO - Argument --input: /home/jb23590/PoritesTest/PoritesTestWorking 2024-09-06 12:20:37,955 - illumiprocessor - INFO - Argument --log_path: None 2024-09-06 12:20:37,956 - illumiprocessor - INFO - Argument --min_len: 40 2024-09-06 12:20:37,956 - illumiprocessor - INFO - Argument --no_merge: False 2024-09-06 12:20:37,956 - illumiprocessor - INFO - Argument --output: /home/jb23590/PoritesTest/PoritesTestClean 2024-09-06 12:20:37,956 - illumiprocessor - INFO - Argument --phred: phred33 2024-09-06 12:20:37,956 - illumiprocessor - INFO - Argument --r1_pattern: None 2024-09-06 12:20:37,956 - illumiprocessor - INFO - Argument --r2_pattern: None 2024-09-06 12:20:37,956 - illumiprocessor - INFO - Argument --se: False 2024-09-06 12:20:37,956 - illumiprocessor - INFO - Argument --trimmomatic: /usr/local/miniconda/envs/phyluce-1.7.1/bin/trimmomatic 2024-09-06 12:20:37,956 - illumiprocessor - INFO - Argument --verbosity: INFO Traceback (most recent call last): File "/usr/local/miniconda/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in sys.exit(main()) File "/usr/local/miniconda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/usr/local/miniconda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 34, in main reads.append(core.SequenceData(args, conf, start_name, end_name)) File "/usr/local/miniconda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 85, in init self._get_read_data() File "/usr/local/miniconda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 106, in _get_read_data "errors in your conf file.".format(self.start_name) OSError: There is a problem with the read names for BFC_296_TTCGCCAT. Ensure you do not have spelling/capitalization errors in your conf file.

--

I have tried removing this file from the config entirely but then the system gives me an error regarding a missing [names].

The only thing I can think of is the following: BFC_296 is the only individual to contain a few reads where there exists an 'N' base at the start of the i5 index (NTCGTACG). For all others this is replaced with a T (TTCGTACG). I wonder if this is causing issue. I have tried listed both sequences in [tag sequences] both individually and together, and the issue still persists.

Thank you and any help is super appreciated.

Regards, Jay

brantfaircloth commented 2 weeks ago

Hi Jay,

Your read files are not named following the pattern that Illumiprocessor expects. If you use:

illumiprocessor --input reads --output PoritesTestClean --config PoritesTestConfig.conf --cores 1 --r1-pattern "{}_R1.fastq(?:.gz)*" --r2-pattern "{}_R2.fastq(?:.gz)*"

It should work as expected. That basically passes new regular expressions to the program that match how your read files are named.

-b