faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
78 stars 49 forks source link

Illumiprocessor error: configparser.MissingSectionHeaderError: File contains no section headers #282

Closed marianamazzochi closed 1 year ago

marianamazzochi commented 1 year ago

I am having some issues when trying to use the pipeline for Illumiprocessor, and would be very grateful if one of you is able to help me to found it out. I am working in a MacOS Monterey 12.6, and used some example files from Rapid Genomics and my .conf file looks like this:


[adapters]


i7 = GATCGGAAGAGCACACGTCTGAACTCCAGTCACBCBCBCBCATCTCGTATGCCGTCTTCTGCTTG
 i5 = AATGATACGGCGACCACCGAGATCTACACBCBCBCBCACACTCTTTCCCTACACGACGCTCTTCCGATCT

[tag sequences]


i5P01WG01 = TCTACTCT 
i7P01WG01 = CACCTTAC

[tag map]


RAPiDGenomicsP01WG01 = i5P01WG01,i7P01WG01

[names]


RAPiDGenomicsP01WG01 = apeteJLB07


When I try to run "illumiprocessor --input 1_raw-fastq --output 2_clean-fastq --config illumiprocessor.conf --cores 4", I get the following error (I am pasting the complete error):

"2022-10-17 16:25:46,835 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-17 16:25:46,835 - illumiprocessor - INFO - Version: 2.10 2022-10-17 16:25:46,835 - illumiprocessor - INFO - Argument --config: illumiprocessor.conf 2022-10-17 16:25:46,835 - illumiprocessor - INFO - Argument --cores: 4 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --input: /Users/marimazzochi/Desktop/1_raw-fastq 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --log_path: None 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --output: /Users/marimazzochi/Desktop/2_clean-fastq 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --r1_pattern: _R1 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --r2_pattern: _R2 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --se: False 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --trimmomatic: /Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --verbosity: INFO Traceback (most recent call last): File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in sys.exit(main()) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 29, in main conf.read(args.config) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/configparser.py", line 697, in read self._read(fp, filename) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/configparser.py", line 1080, in _read raise MissingSectionHeaderError(fpname, lineno, line) configparser.MissingSectionHeaderError: File contains no section headers."

In order to make it work, I tried some things that I found on StackOverFlow and on GitHub, but nothing worked out: 1 - replacing "-" for "*" in the adapters section because some people that use MacOS said that this caused troubles for them; 2 - replaced ":" for "="; 3 - changed UTF-8 for ASCII when saving my .conf file. I suspect that this issue may be related to formatting of .conf file. I am not able to save it as .conf, only as .rtf on TextEdit or as .csv on Numbers, and then I am manually changing it to .conf on Terminal (using the command 'mv'). That said, do you have any suggestion about the error or maybe about a different way of creating a .conf file?

Cheers, Mariana Mazzochi

brantfaircloth commented 1 year ago

The config file needs to be formatted as described in the illumiprocessor manual. It might be best to use a free text editor (VSCode is one) to format the file and save it as ".conf" as a text file in UTF-8 format. As mentioned previously, once the file is formatted correctly, if you still have problems, it is almost impossible to diagnose them without some example data.

The example file you provided above seems incorrect in a number of ways - from BCBCBC being in the adapter sequences to multiple entries on a single line (e.g. as in 
i5P01WG01 = TCTACTCT 
i7P01WG01 = CACCTTAC).

marianamazzochi commented 1 year ago

Thanks, Brant. I followed the Illumiprocessor manual, but I haven't tried VSCode. I will try it, if it's available for MacOS.

The multiple entries on a single line happened when I pasted the text on GitHub. This is not happening in the original file.

However, I also found it weird when I previously noticed the BCBCBC fragment in the middle of the adapter sequences; but I have downloaded this .conf file from an online example, which was supposed to work. Have you provided any online .conf file examples which I could manage to make it work? I tried to open what was supposed to be an online folder that you provided in an old issue here on GitHub, but the page showed up an Error 404. I think the folder is not available anymore. If you have any new example files which I could use, I would be very glad to do so. I am new in genomics and Python, so I am still trying to learn until my final sequences come back from Rapid Genomics.

Furthermore, I would like to congratulate you for all your work and specially for managing to make it available online. This is, indeed, quite admirable.

brantfaircloth commented 1 year ago

Gotcha. And, you bet - take a look here at this config file that is used for testing the illumiprocessor software - it should give you an idea of what the file should look like.

marianamazzochi commented 1 year ago

Brant, thanks for all your help. I managed to download your raw reads and also used your .conf file, and it appears to have worked out. However, a new error is showing up: 'FileNotFoundError: [Errno 2] No such file or directory: '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ1-single.fastq.gz' -> '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ-singleton.fastq.gz'

Do you know what could have happened? Cheers,

brantfaircloth commented 1 year ago

It seems something weird is happening when you run the program. That said, I need more information to be able to help you (how were directories setup, what command did you use to run illumiprocessor, etc.). Otherwise I'm just guessing at the issues.

When I use the test config file with the test data, downloaded to a directory where the directory structure is like this (reads in "raw-reads" directory; conf file in tru-seq-ht.conf):

.
├── raw-reads
│   ├── fake-truht_S1_L001_R1_001.fastq.gz
│   ├── fake-truht_S1_L001_R2_001.fastq.gz
│   ├── fake-truht_S2_L001_R1_001.fastq.gz
│   └── fake-truht_S2_L001_R2_001.fastq.gz
└── tru-seq-ht.conf

and I run illumiprocessor like this:

illumiprocessor --input raw-reads --output clean-reads --config tru-seq-ht.conf

the program runs and completes as expected.

marianamazzochi commented 1 year ago

Yes, the directory structure is like this. I am working at my Desktop - the raw reads are in a folder, and the config file is outside this folder. The command I used was:

illumiprocessor \ --input 1_raw-fastq \ --output 2_clean-fastq \ --config illumiprocessor.conf \ --cores 4

The only different thing that happened was that when I tried to save my .conf file in VSCode, there was no option to save as .conf, or even to select UTF-8 encoding. So I couldn't do that, and I had to save it as .txt and then manually change the extension to .conf in the Terminal.

brantfaircloth commented 1 year ago

Weird question - are you sure that the fastq files were downloaded correctly? If you download those incorrectly from github, you can actually get html files. As an aside, when saving a file as a "conf" file in VSCode, you should just be able to enter "my_file.conf" for the file name the "Save As" box. No need to save as .txt and then convert.

Please paste the entire output to the screen from the moment you start illumiprocessor to the end of the error that you see (and any text that happens to follow that).

marianamazzochi commented 1 year ago

I downloaded the fastq files and checked if their extensions are really .fastq, and yes, they are. About VSCode, I have just saved the .conf file exactly as you said, and tried to run it again. The entire output of the error is the same:

[WARNING] Output directory exists, REMOVE [Y/n]? "Y" 2022-10-21 14:23:42,500 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-21 14:23:42,500 - illumiprocessor - INFO - Version: 2.10 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --config: illumiprocessor.conf 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --cores: 4 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --input: /Users/marimazzochi/Desktop/1_raw-fastq 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --log_path: None 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --output: /Users/marimazzochi/Desktop/2_clean-fastq 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --r1_pattern: None 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --r2_pattern: None 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --se: False 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --trimmomatic: /Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --verbosity: INFO 2022-10-21 14:23:42,544 - illumiprocessor - INFO - Trimming samples with Trimmomatic Runningmultiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 307, in runner trimmomatic_merger(sample) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 294, in trimmomatic_merger os.rename(singles[0], new_pth) FileNotFoundError: [Errno 2] No such file or directory: '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ1-single.fastq.gz' -> '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ-singleton.fastq.gz' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in sys.exit(main()) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 45, in main pool.map(core.runner, work) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ1-single.fastq.gz' -> '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ-singleton.fastq.gz'

brantfaircloth commented 1 year ago

I think I understand the reason you are seeing this error, but I still cannot reproduce. That said, this error should not cause a problem when you analyze your "real" data once you have your config file setup correctly.

If you would like to test with some additional data, where the problem you are seeing should disappear, you can follow the tutorial for phyluce, e.g. as in https://phyluce.readthedocs.io/en/latest/tutorials/tutorial-1.html#clean-the-read-data.

brantfaircloth commented 1 year ago

That said, could you also ensure that when you run:

/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic

it outputs something like:

Usage:
       PE [-version] [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-summary <statsSummaryFile>] [-quiet] [-validatePairs] [-basein <inputBase> | <inputFile1> <inputFile2>] [-baseout <outputBase> | <outputFile1P> <outputFile1U> <outputFile2P> <outputFile2U>] <trimmer1>...
   or:
       SE [-version] [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-summary <statsSummaryFile>] [-quiet] <inputFile> <outputFile> <trimmer1>...
   or:
       -version
marianamazzochi commented 1 year ago

Thanks, Brant. I will follow the mentioned tutorial and I hope it works out. About your second reply, yes, the output is:

Usage: PE [-version] [-threads ] [-phred33|-phred64] [-trimlog ] [-summary ] [-quiet] [-validatePairs] [-basein | ] [-baseout | ] ... or: SE [-version] [-threads ] [-phred33|-phred64] [-trimlog ] [-summary ] [-quiet] ... or: -version

marianamazzochi commented 1 year ago

Brant, I've just tried the mentioned tutorial and I got the same error once again...

[WARNING] Output directory exists, REMOVE [Y/n]? "Y" 2022-10-21 15:32:52,469 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Version: 2.10 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --config: illumiprocessor.conf 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --cores: 4 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --input: /Users/marimazzochi/Desktop/1_raw-fastq 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --log_path: None 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --output: /Users/marimazzochi/Desktop/2_clean-fastq 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --r1_pattern: None 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --r2_pattern: None 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --se: False 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --trimmomatic: /Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --verbosity: INFO 2022-10-21 15:32:52,513 - illumiprocessor - INFO - Trimming samples with Trimmomatic Runningmultiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 307, in runner trimmomatic_merger(sample) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 294, in trimmomatic_merger os.rename(singles[0], new_pth) FileNotFoundError: [Errno 2] No such file or directory: '/Users/marimazzochi/Desktop/2_clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/alligator_mississippiensis-READ1-single.fastq.gz' -> '/Users/marimazzochi/Desktop/2_clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/alligator_mississippiensis-READ-singleton.fastq.gz' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in sys.exit(main()) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 45, in main pool.map(core.runner, work) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value FileNotFoundError: [Errno 2] No such file or directory: '/Users/marimazzochi/Desktop/2_clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/alligator_mississippiensis-READ1-single.fastq.gz' -> '/Users/marimazzochi/Desktop/2_clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/alligator_mississippiensis-READ-singleton.fastq.gz'

brantfaircloth commented 1 year ago

Will you try this using the attached (illumiprocessor.txt) and by changing your input folder name to raw-fastq. Then run:

illumiprocessor \
    --input raw-fastq/ \
    --output clean-fastq \
    --config illumiprocessor.txt \
    --cores 1

and let me know if that works?

marianamazzochi commented 1 year ago

Brant, I did it (and used the config file with the extension .txt, right?) and I got this error:

[WARNING] Output directory exists, REMOVE [Y/n]? "Y" 2022-10-21 16:01:57,458 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Version: 2.10 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --config: illumiprocessor.txt 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --cores: 1 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --input: /Users/marimazzochi/Desktop/raw-fastq 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --log_path: None 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --output: /Users/marimazzochi/Desktop/clean-fastq 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --r1_pattern: None 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --r2_pattern: None 2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --se: False 2022-10-21 16:01:57,459 - illumiprocessor - INFO - Argument --trimmomatic: /Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-21 16:01:57,459 - illumiprocessor - INFO - Argument --verbosity: INFO 2022-10-21 16:01:57,464 - illumiprocessor - INFO - Trimming samples with Trimmomatic RunningTraceback (most recent call last): File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in sys.exit(main()) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 47, in main list(map(core.runner, work)) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 307, in runner trimmomatic_merger(sample) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 294, in trimmomatic_merger os.rename(singles[0], new_pth) FileNotFoundError: [Errno 2] No such file or directory: '/Users/marimazzochi/Desktop/clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/alligator_mississippiensis-READ1-single.fastq.gz' -> '/Users/marimazzochi/Desktop/clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/alligator_mississippiensis-READ-singleton.fastq.gz'

brantfaircloth commented 1 year ago

ok - we've reached the point where I'm not entirely sure I can help you fix the error. Everything that I've sent you, I've run on my own machine and everything appears to be working just fine. It seems like your machine is not allowing you to create files or directories like clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/ within /Users/marimazzochi/Desktop/, but I cannot determine why that would be occurring.

You have a couple of options: (1) you can try another machine (perhaps linux) or (2) you could trim your read files manually using trimmomatic or some other program.

marianamazzochi commented 1 year ago

That's very helpful, Brant. I also don't understand why this is happening, but I have a colleague who told me that working with Python on MacOS could be tricky sometimes. That said, I will try to reproduce all this tests on a Linux next week. When I do it, I will come back on this issue to tell you if this was really a problem with my machine (which I also think it is). Thanks again. Cheers!

brantfaircloth commented 1 year ago

What's weird is that I've run all of these tests on MacOS (same version as you) using Phyluce 1.7.1 (the newest version). Please do let me know if working on linux fixes the issue.

marianamazzochi commented 1 year ago

Brant, I am trying to run the test on Linux on Thursday, which is when I will be able to get the machine at the lab. However, yesterday Rapid Genomics has returned my raw fastq data, and the adapters also present BCBCBCBC in their middle, just as the first example I found online and you also thought it was weird. They said "The adapters used are below, "BCBCBCBC" stands for the barcodes." In this context, I would like to know if you have worked with this type of adapter and if you know what to do with the barcodes. Should I simply remove the BCBCBC part?

Thanks,

brantfaircloth commented 1 year ago

This may simply indicate that the barcode sequences replace the BCBCBCBC of the adapter sequence. You'll need to ask them to be sure. It may be that they have actually replaced the barcode/index sequence with BCBCBCBC for each sample, but I have no absolutely no idea why they would do that. They should be able to give you a file of the i5 and i7 indexes used and the combinations of those that were applied to each of your samples.

marianamazzochi commented 1 year ago

Yes, apparently they have replaced the actual barcode with BCBCBC for each sample, and I have figured out that all my samples have different barcodes for i7 (and all of them have the same for i5). In this case, and considering that the pipeline that we're working with just allows us to give one i7 and one i5 adapter, do you know how could I proceed? Will I need to perform the trimming once for each sample?

I have already asked them too. Hope they answer soon. Thanks again,

brantfaircloth commented 1 year ago

You can make any combination of multiple i7s and multiple i5s that you need to - you are not limited to only one i7 combined with one i5. For example, the following shows two different combinations of i5 and i7 used with each sample:

[tag sequences]
i7-N701:GCTACGCT
i7-N702:GGACTCCT
i5-N501:TAGATCGC
i5-N502:CTCTCTAT

[tag map]
morelia-viridis1_GCCTTCA:i7-N701,i5-N501
cnemidophorus-sexlineatus1_GGTACGC:i7-N702,i5-N502

That said, this assumes you know which indexes were applied to which samples.

You could also, if you want to, just trim the data using trimmomatic for each set of files, using the first part of the adapter sequences for i5 and i7 as the sequence to trim - put that in a file named adapters.fa like so:

>adap1
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>adap2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Then run:

trimmomatic PE -phred33 \
<read1_file_name>_R1_001.fastq.gz \
<read2_file_name>_R2_001.fastq.gz \
<output_read1_file_name>_R1_001.trim.fastq.gz \
<output_read1_file_name>_R1_001.unpair.fastq.gz \
<output_read2_file_name>_R2_001.trim.fastq.gz \
<output_read2_file_name>_R2_001.unpair.fastq.gz \
ILLUMINACLIP:adapters1.fa:2:30:10 LEADING:5 TRAILING:15 SLIDINGWINDOW:4:15 MINLEN:40
marianamazzochi commented 1 year ago

Yes, that makes sense at all. Thanks for your patience. But I think what I haven't really understood is the adapters with BCBCBC. E.g., my i7 adapter looks like this: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC-BCBCBCBC-ATCTCGTATGCCGTCTTCTGCTTG

Do you think that they should work out if I just replace the -BCBCBCBC- for *, as your example config file (the one we've talked about in this issue)?

brantfaircloth commented 1 year ago

I am not sure - I somewhat doubt that RapidGenomics searched through all your sequence data and replaced every index sequence with BCBCBCBC in any reads where adapter contamination occurred, but I'm not really sure what they've done (or why they would do what they've done to begin with). You'll just need to find out from them.

That said, if you trim manually with trimmomatic, it wont matter - the manual way searches for only the first part of the adapter sequence, regardless of the index.

marianamazzochi commented 1 year ago

Just to make sure that I'm being clear enough, I am attaching a print screen from the .csv sent by RG. Besides this file, they also send me i5 and i7 adapters pasted into gmail.

Screen Shot 2022-10-25 at 14 38 38
brantfaircloth commented 1 year ago

I don't see where in any of this the indexes are equal to BCBCBCBC. The index sequences for each sample are given in Columns D and E. So, the combo for the first and second samples are:

[tag sequences]
i5-507:ACGTCCTG
i7-97:ACTTGGCT
i7-98:TTACGTGC

[tag map]
sample1:i7-97,i5-507
sample1:i7-98,i5-507
marianamazzochi commented 1 year ago

Yes, you're right - I think we are misunderstanding each other - but the provided adapters have -BCBCBC- in their middle. Then, I was wondering if I could replace the -BCBCBC- for *, as your last example of config file. As we've seen before, I have tried to run another example that had BCBCBC and Illumiprocessor wasn't able to read. So I thought about making this replacement. What do you think?

Thanks again!

brantfaircloth commented 1 year ago

The first part of your configuration file should look like this (which is explained in the documentation for using illumiprocessor):

[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences]
i5-507:ACGTCCTG
i7-97:ACTTGGCT
i7-98:TTACGTGC

[tag map]
sample1:i7-97,i5-507
sample1:i7-98,i5-507
marianamazzochi commented 1 year ago

Yes, thanks for copying here. I am following this exact documentation, which is really great. I started this issue with an example that also had the -BCBCBCBC- fragment in the middle of the adapters, and I tried to run it with Illumiprocessor, which didn't work. When my raw data arrived (yesterday), Rapid Genomics has sent the adapters with -BCBCBCBC- too, and I was wondering if it would be ok to remove this part of the adapters and replace it with *, just as the example you copied here.

brantfaircloth commented 1 year ago

Yes, that is what you should do if you want to use illumiprocessor.

brantfaircloth commented 1 year ago

I'm sorry - I don't. if you upload your config file and send me the file names in the raw-fastq directory/folder, I can make a test example to check to see what the issue might be.

marianamazzochi commented 1 year ago

Yes, of course. I have just replaced the complicated names from Rapid Genomics for something simpler in order to make it work, but now I am getting a different error:

"[WARNING] Output directory exists, REMOVE [Y/n]? "Y" 2022-10-27 14:39:52,829 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-27 14:39:52,830 - illumiprocessor - INFO - Version: 2.10 2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --config: illumiprocessor.conf 2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --cores: 1 2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --input: /home/aline/UCEs_Mari/test/raw-fastq 2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --log_path: None 2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --output: /home/aline/UCEs_Mari/test/clean-fastq 2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --r1_pattern: None 2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --r2_pattern: None 2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --se: False 2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --trimmomatic: /home/aline/.conda/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --verbosity: INFO Traceback (most recent call last): File "/home/aline/.conda/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in sys.exit(main()) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 34, in main reads.append(core.SequenceData(args, conf, start_name, end_name)) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 85, in init self._get_read_data() File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 106, in _get_read_data "errors in your conf file.".format(self.start_name) OSError: There is a problem with the read names for SULA1. Ensure you do not have spelling/capitalization errors in your conf file."

My conf file is attached and also a snapshot from the folder in which my fastqs are. P.S.: I've tried to work with a .txt and a .conf file. Neither of them worked. Captura de tela de 2022-10-27 14-41-59 illumiprocessor.txt

brantfaircloth commented 1 year ago

Using the attached config file (test.txt) everything works well.

You are seeing the errors you are seeing because your fastq file names are not what illumiprocessor expects by default. So, you need to adjust the regular expression to find the files that you are trimming. You do this by running illumiprocessor with the --r1-pattern and --r2-pattern like:

illumiprocessor \
--input raw-fastq \
--output clean-fastq \
--config test.conf \
--r1-pattern "{}_(R1|READ1|Read1|read1).fastq(?:.gz)*" \
--r2-pattern "{}_(R2|READ2|Read2|read2).fastq(?:.gz)*"

This will find files in raw-fastq that end in .fastq or .fastq.gz, and it substitutes the names from the [names] section to the left of the : into the brackets {}.

marianamazzochi commented 1 year ago

Thanks, Brant. My files end in _R1.fastq and _R2.fastq, so my command looks like this:

illumiprocessor \ --input raw-fastq \ --output clean-fastq \ --config test.conf \ --r1-pattern "{}_R1.fastq" \ --r2-pattern "{}_R2.fastq"

Is it correct?

brantfaircloth commented 1 year ago

Assuming your files remain named like your example files, either what I sent or what you suggest should work (what I sent makes several items "optional" - so it looks more complicated but matches your example name just fine).

marianamazzochi commented 1 year ago

This is really weird, because I keep getting the same error. :( I tried it with your command and with mine, and neither of them worked out.

illumiprocessor \

--input raw-fastq \ --output clean-fastq \ --config test.txt \ --r1-pattern "{}(R1|READ1|Read1|read1).fastq(?:.gz)*" \ --r2-pattern "{}(R2|READ2|Read2|read2).fastq(?:.gz)" [WARNING] Output directory exists, REMOVE [Y/n]? "Y" 2022-10-27 15:19:44,917 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Version: 2.10 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --config: test.txt 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --cores: 1 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --input: /home/aline/UCEs_Mari/test/raw-fastq 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --log_path: None 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --output: /home/aline/UCEs_Mari/test/clean-fastq 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-27 15:19:44,919 - illumiprocessor - INFO - Argument --r1pattern: {}(R1|READ1|Read1|read1).fastq(?:.gz) 2022-10-27 15:19:44,919 - illumiprocessor - INFO - Argument --r2pattern: {}(R2|READ2|Read2|read2).fastq(?:.gz)* 2022-10-27 15:19:44,919 - illumiprocessor - INFO - Argument --se: False 2022-10-27 15:19:44,920 - illumiprocessor - INFO - Argument --trimmomatic: /home/aline/.conda/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-27 15:19:44,920 - illumiprocessor - INFO - Argument --verbosity: INFO Traceback (most recent call last): File "/home/aline/.conda/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in sys.exit(main()) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 34, in main reads.append(core.SequenceData(args, conf, start_name, end_name)) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 85, in init self._get_read_data() File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 106, in _get_read_data "errors in your conf file.".format(self.start_name) OSError: There is a problem with the read names for SULA1. Ensure you do not have spelling/capitalization errors in your conf file.

brantfaircloth commented 1 year ago

I'm not sure what else to try - I cannot repeat the error you are seeing on my machines. Your best best may be to trim files one-by-one (using trimmomatic or whatever you prefer) and naming the resulting files in a way that phyluce expects.

marianamazzochi commented 1 year ago

Brant, I found it, it was my mistake. There was really a capitalization error. However, now I am getting this one:

[WARNING] Output directory exists, REMOVE [Y/n]? "Y" 2022-10-27 15:32:57,149 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-27 15:32:57,149 - illumiprocessor - INFO - Version: 2.10 2022-10-27 15:32:57,149 - illumiprocessor - INFO - Argument --config: test.txt 2022-10-27 15:32:57,149 - illumiprocessor - INFO - Argument --cores: 1 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --input: /home/aline/UCEs_Mari/test/raw-fastq 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --log_path: None 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --output: /home/aline/UCEs_Mari/test/clean-fastq 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --r1_pattern: {}_R1.fastq 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --r2_pattern: {}_R2.fastq 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --se: False 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --trimmomatic: /home/aline/.conda/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --verbosity: INFO 2022-10-27 15:32:57,152 - illumiprocessor - INFO - Trimming samples with Trimmomatic RunningTraceback (most recent call last): File "/home/aline/.conda/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in sys.exit(main()) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 47, in main list(map(core.runner, work)) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 307, in runner trimmomatic_merger(sample) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 294, in trimmomatic_merger os.rename(singles[0], new_pth) FileNotFoundError: [Errno 2] No such file or directory: '/home/aline/UCEs_Mari/test/clean-fastq/sula-w1/split-adapter-quality-trimmed/sula-w1-READ1-single.fastq.gz' -> '/home/aline/UCEs_Mari/test/clean-fastq/sula-w1/split-adapter-quality-trimmed/sula-w1-READ-singleton.fastq.gz'

brantfaircloth commented 1 year ago

Again, I am not seeing this error or able to repeat it. I think you should try to move ahead in another way.

marianamazzochi commented 1 year ago

Thanks for all your help and patience, Brant. The best part is that I just have managed it to work out and I don't have any idea of what I have done to correct it - I've only worked with the .gz files instead of working with the unzipped files, and followed my last example of command, just adding the .gz in the end. Don't know if I should be happy or sad, but at least it worked! If you have any idea of what could have happened, I would be glad to hear (read) it. Thanks again and cheers, Hope I can go through the next steps without coming back here to ask you more things! :)

brantfaircloth commented 1 year ago

Cool 👍