Closed marianamazzochi closed 2 years ago
The config file needs to be formatted as described in the illumiprocessor manual. It might be best to use a free text editor (VSCode is one) to format the file and save it as ".conf" as a text file in UTF-8 format. As mentioned previously, once the file is formatted correctly, if you still have problems, it is almost impossible to diagnose them without some example data.
The example file you provided above seems incorrect in a number of ways - from BCBCBC
being in the adapter sequences to multiple entries on a single line (e.g. as in
i5P01WG01 = TCTACTCT
i7P01WG01 = CACCTTAC
).
Thanks, Brant. I followed the Illumiprocessor manual, but I haven't tried VSCode. I will try it, if it's available for MacOS.
The multiple entries on a single line happened when I pasted the text on GitHub. This is not happening in the original file.
However, I also found it weird when I previously noticed the BCBCBC fragment in the middle of the adapter sequences; but I have downloaded this .conf file from an online example, which was supposed to work. Have you provided any online .conf file examples which I could manage to make it work? I tried to open what was supposed to be an online folder that you provided in an old issue here on GitHub, but the page showed up an Error 404. I think the folder is not available anymore. If you have any new example files which I could use, I would be very glad to do so. I am new in genomics and Python, so I am still trying to learn until my final sequences come back from Rapid Genomics.
Furthermore, I would like to congratulate you for all your work and specially for managing to make it available online. This is, indeed, quite admirable.
Gotcha. And, you bet - take a look here at this config file that is used for testing the illumiprocessor software - it should give you an idea of what the file should look like.
Brant, thanks for all your help. I managed to download your raw reads and also used your .conf file, and it appears to have worked out. However, a new error is showing up: 'FileNotFoundError: [Errno 2] No such file or directory: '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ1-single.fastq.gz' -> '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ-singleton.fastq.gz'
Do you know what could have happened? Cheers,
It seems something weird is happening when you run the program. That said, I need more information to be able to help you (how were directories setup, what command did you use to run illumiprocessor, etc.). Otherwise I'm just guessing at the issues.
When I use the test config file with the test data, downloaded to a directory where the directory structure is like this (reads in "raw-reads" directory; conf file in tru-seq-ht.conf):
.
├── raw-reads
│ ├── fake-truht_S1_L001_R1_001.fastq.gz
│ ├── fake-truht_S1_L001_R2_001.fastq.gz
│ ├── fake-truht_S2_L001_R1_001.fastq.gz
│ └── fake-truht_S2_L001_R2_001.fastq.gz
└── tru-seq-ht.conf
and I run illumiprocessor like this:
illumiprocessor --input raw-reads --output clean-reads --config tru-seq-ht.conf
the program runs and completes as expected.
Yes, the directory structure is like this. I am working at my Desktop - the raw reads are in a folder, and the config file is outside this folder. The command I used was:
illumiprocessor \ --input 1_raw-fastq \ --output 2_clean-fastq \ --config illumiprocessor.conf \ --cores 4
The only different thing that happened was that when I tried to save my .conf file in VSCode, there was no option to save as .conf, or even to select UTF-8 encoding. So I couldn't do that, and I had to save it as .txt and then manually change the extension to .conf in the Terminal.
Weird question - are you sure that the fastq files were downloaded correctly? If you download those incorrectly from github, you can actually get html files. As an aside, when saving a file as a "conf" file in VSCode, you should just be able to enter "my_file.conf" for the file name the "Save As" box. No need to save as .txt
and then convert.
Please paste the entire output to the screen from the moment you start illumiprocessor to the end of the error that you see (and any text that happens to follow that).
I downloaded the fastq files and checked if their extensions are really .fastq, and yes, they are. About VSCode, I have just saved the .conf file exactly as you said, and tried to run it again. The entire output of the error is the same:
[WARNING] Output directory exists, REMOVE [Y/n]? "Y" 2022-10-21 14:23:42,500 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-21 14:23:42,500 - illumiprocessor - INFO - Version: 2.10 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --config: illumiprocessor.conf 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --cores: 4 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --input: /Users/marimazzochi/Desktop/1_raw-fastq 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --log_path: None 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --output: /Users/marimazzochi/Desktop/2_clean-fastq 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --r1_pattern: None 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --r2_pattern: None 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --se: False 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --trimmomatic: /Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-21 14:23:42,501 - illumiprocessor - INFO - Argument --verbosity: INFO 2022-10-21 14:23:42,544 - illumiprocessor - INFO - Trimming samples with Trimmomatic Runningmultiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 307, in runner trimmomatic_merger(sample) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 294, in trimmomatic_merger os.rename(singles[0], new_pth) FileNotFoundError: [Errno 2] No such file or directory: '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ1-single.fastq.gz' -> '/Users/marimazzochi/Desktop/2_clean-fastq/fake-truht1/split-adapter-quality-trimmed/fake-truht1-READ-singleton.fastq.gz' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in
I think I understand the reason you are seeing this error, but I still cannot reproduce. That said, this error should not cause a problem when you analyze your "real" data once you have your config file setup correctly.
If you would like to test with some additional data, where the problem you are seeing should disappear, you can follow the tutorial for phyluce
, e.g. as in https://phyluce.readthedocs.io/en/latest/tutorials/tutorial-1.html#clean-the-read-data.
That said, could you also ensure that when you run:
/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic
it outputs something like:
Usage:
PE [-version] [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-summary <statsSummaryFile>] [-quiet] [-validatePairs] [-basein <inputBase> | <inputFile1> <inputFile2>] [-baseout <outputBase> | <outputFile1P> <outputFile1U> <outputFile2P> <outputFile2U>] <trimmer1>...
or:
SE [-version] [-threads <threads>] [-phred33|-phred64] [-trimlog <trimLogFile>] [-summary <statsSummaryFile>] [-quiet] <inputFile> <outputFile> <trimmer1>...
or:
-version
Thanks, Brant. I will follow the mentioned tutorial and I hope it works out. About your second reply, yes, the output is:
Usage:
PE [-version] [-threads
Brant, I've just tried the mentioned tutorial and I got the same error once again...
[WARNING] Output directory exists, REMOVE [Y/n]? "Y" 2022-10-21 15:32:52,469 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Version: 2.10 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --config: illumiprocessor.conf 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --cores: 4 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --input: /Users/marimazzochi/Desktop/1_raw-fastq 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --log_path: None 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --output: /Users/marimazzochi/Desktop/2_clean-fastq 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --r1_pattern: None 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --r2_pattern: None 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --se: False 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --trimmomatic: /Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-21 15:32:52,470 - illumiprocessor - INFO - Argument --verbosity: INFO 2022-10-21 15:32:52,513 - illumiprocessor - INFO - Trimming samples with Trimmomatic Runningmultiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, *kwds)) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 307, in runner trimmomatic_merger(sample) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 294, in trimmomatic_merger os.rename(singles[0], new_pth) FileNotFoundError: [Errno 2] No such file or directory: '/Users/marimazzochi/Desktop/2_clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/alligator_mississippiensis-READ1-single.fastq.gz' -> '/Users/marimazzochi/Desktop/2_clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/alligator_mississippiensis-READ-singleton.fastq.gz' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in
Will you try this using the attached (illumiprocessor.txt) and by changing your input folder name to raw-fastq. Then run:
illumiprocessor \
--input raw-fastq/ \
--output clean-fastq \
--config illumiprocessor.txt \
--cores 1
and let me know if that works?
Brant, I did it (and used the config file with the extension .txt, right?) and I got this error:
[WARNING] Output directory exists, REMOVE [Y/n]? "Y"
2022-10-21 16:01:57,458 - illumiprocessor - INFO - ==================== Starting illumiprocessor ===================
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Version: 2.10
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --config: illumiprocessor.txt
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --cores: 1
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --input: /Users/marimazzochi/Desktop/raw-fastq
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --log_path: None
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --min_len: 40
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --no_merge: False
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --output: /Users/marimazzochi/Desktop/clean-fastq
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --phred: phred33
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --r1_pattern: None
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --r2_pattern: None
2022-10-21 16:01:57,458 - illumiprocessor - INFO - Argument --se: False
2022-10-21 16:01:57,459 - illumiprocessor - INFO - Argument --trimmomatic: /Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic
2022-10-21 16:01:57,459 - illumiprocessor - INFO - Argument --verbosity: INFO
2022-10-21 16:01:57,464 - illumiprocessor - INFO - Trimming samples with Trimmomatic
RunningTraceback (most recent call last):
File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in
ok - we've reached the point where I'm not entirely sure I can help you fix the error. Everything that I've sent you, I've run on my own machine and everything appears to be working just fine. It seems like your machine is not allowing you to create files or directories like clean-fastq/alligator_mississippiensis/split-adapter-quality-trimmed/
within /Users/marimazzochi/Desktop/
, but I cannot determine why that would be occurring.
You have a couple of options: (1) you can try another machine (perhaps linux) or (2) you could trim your read files manually using trimmomatic or some other program.
That's very helpful, Brant. I also don't understand why this is happening, but I have a colleague who told me that working with Python on MacOS could be tricky sometimes. That said, I will try to reproduce all this tests on a Linux next week. When I do it, I will come back on this issue to tell you if this was really a problem with my machine (which I also think it is). Thanks again. Cheers!
What's weird is that I've run all of these tests on MacOS (same version as you) using Phyluce 1.7.1 (the newest version). Please do let me know if working on linux fixes the issue.
Brant, I am trying to run the test on Linux on Thursday, which is when I will be able to get the machine at the lab. However, yesterday Rapid Genomics has returned my raw fastq data, and the adapters also present BCBCBCBC in their middle, just as the first example I found online and you also thought it was weird. They said "The adapters used are below, "BCBCBCBC" stands for the barcodes." In this context, I would like to know if you have worked with this type of adapter and if you know what to do with the barcodes. Should I simply remove the BCBCBC part?
Thanks,
This may simply indicate that the barcode sequences replace the BCBCBCBC
of the adapter sequence. You'll need to ask them to be sure. It may be that they have actually replaced the barcode/index sequence with BCBCBCBC
for each sample, but I have no absolutely no idea why they would do that. They should be able to give you a file of the i5 and i7 indexes used and the combinations of those that were applied to each of your samples.
Yes, apparently they have replaced the actual barcode with BCBCBC for each sample, and I have figured out that all my samples have different barcodes for i7 (and all of them have the same for i5). In this case, and considering that the pipeline that we're working with just allows us to give one i7 and one i5 adapter, do you know how could I proceed? Will I need to perform the trimming once for each sample?
I have already asked them too. Hope they answer soon. Thanks again,
You can make any combination of multiple i7s and multiple i5s that you need to - you are not limited to only one i7 combined with one i5. For example, the following shows two different combinations of i5 and i7 used with each sample:
[tag sequences]
i7-N701:GCTACGCT
i7-N702:GGACTCCT
i5-N501:TAGATCGC
i5-N502:CTCTCTAT
[tag map]
morelia-viridis1_GCCTTCA:i7-N701,i5-N501
cnemidophorus-sexlineatus1_GGTACGC:i7-N702,i5-N502
That said, this assumes you know which indexes were applied to which samples.
You could also, if you want to, just trim the data using trimmomatic for each set of files, using the first part of the adapter sequences for i5 and i7 as the sequence to trim - put that in a file named adapters.fa
like so:
>adap1
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>adap2
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
Then run:
trimmomatic PE -phred33 \
<read1_file_name>_R1_001.fastq.gz \
<read2_file_name>_R2_001.fastq.gz \
<output_read1_file_name>_R1_001.trim.fastq.gz \
<output_read1_file_name>_R1_001.unpair.fastq.gz \
<output_read2_file_name>_R2_001.trim.fastq.gz \
<output_read2_file_name>_R2_001.unpair.fastq.gz \
ILLUMINACLIP:adapters1.fa:2:30:10 LEADING:5 TRAILING:15 SLIDINGWINDOW:4:15 MINLEN:40
Yes, that makes sense at all. Thanks for your patience. But I think what I haven't really understood is the adapters with BCBCBC. E.g., my i7 adapter looks like this: GATCGGAAGAGCACACGTCTGAACTCCAGTCAC-BCBCBCBC-ATCTCGTATGCCGTCTTCTGCTTG
Do you think that they should work out if I just replace the -BCBCBCBC- for *, as your example config file (the one we've talked about in this issue)?
I am not sure - I somewhat doubt that RapidGenomics searched through all your sequence data and replaced every index sequence with BCBCBCBC
in any reads where adapter contamination occurred, but I'm not really sure what they've done (or why they would do what they've done to begin with). You'll just need to find out from them.
That said, if you trim manually with trimmomatic, it wont matter - the manual way searches for only the first part of the adapter sequence, regardless of the index.
Just to make sure that I'm being clear enough, I am attaching a print screen from the .csv sent by RG. Besides this file, they also send me i5 and i7 adapters pasted into gmail.
I don't see where in any of this the indexes are equal to BCBCBCBC
. The index sequences for each sample are given in Columns D and E. So, the combo for the first and second samples are:
[tag sequences]
i5-507:ACGTCCTG
i7-97:ACTTGGCT
i7-98:TTACGTGC
[tag map]
sample1:i7-97,i5-507
sample1:i7-98,i5-507
Yes, you're right - I think we are misunderstanding each other - but the provided adapters have -BCBCBC- in their middle. Then, I was wondering if I could replace the -BCBCBC- for *, as your last example of config file. As we've seen before, I have tried to run another example that had BCBCBC and Illumiprocessor wasn't able to read. So I thought about making this replacement. What do you think?
Thanks again!
The first part of your configuration file should look like this (which is explained in the documentation for using illumiprocessor):
[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT
[tag sequences]
i5-507:ACGTCCTG
i7-97:ACTTGGCT
i7-98:TTACGTGC
[tag map]
sample1:i7-97,i5-507
sample1:i7-98,i5-507
Yes, thanks for copying here. I am following this exact documentation, which is really great. I started this issue with an example that also had the -BCBCBCBC- fragment in the middle of the adapters, and I tried to run it with Illumiprocessor, which didn't work. When my raw data arrived (yesterday), Rapid Genomics has sent the adapters with -BCBCBCBC- too, and I was wondering if it would be ok to remove this part of the adapters and replace it with *, just as the example you copied here.
Yes, that is what you should do if you want to use illumiprocessor.
I'm sorry - I don't. if you upload your config file and send me the file names in the raw-fastq
directory/folder, I can make a test example to check to see what the issue might be.
Yes, of course. I have just replaced the complicated names from Rapid Genomics for something simpler in order to make it work, but now I am getting a different error:
"[WARNING] Output directory exists, REMOVE [Y/n]? "Y"
2022-10-27 14:39:52,829 - illumiprocessor - INFO - ==================== Starting illumiprocessor ===================
2022-10-27 14:39:52,830 - illumiprocessor - INFO - Version: 2.10
2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --config: illumiprocessor.conf
2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --cores: 1
2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --input: /home/aline/UCEs_Mari/test/raw-fastq
2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --log_path: None
2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --min_len: 40
2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --no_merge: False
2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --output: /home/aline/UCEs_Mari/test/clean-fastq
2022-10-27 14:39:52,830 - illumiprocessor - INFO - Argument --phred: phred33
2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --r1_pattern: None
2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --r2_pattern: None
2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --se: False
2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --trimmomatic: /home/aline/.conda/envs/phyluce-1.7.1/bin/trimmomatic
2022-10-27 14:39:52,831 - illumiprocessor - INFO - Argument --verbosity: INFO
Traceback (most recent call last):
File "/home/aline/.conda/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in
My conf file is attached and also a snapshot from the folder in which my fastqs are. P.S.: I've tried to work with a .txt and a .conf file. Neither of them worked. illumiprocessor.txt
Using the attached config file (test.txt) everything works well.
You are seeing the errors you are seeing because your fastq file names are not what illumiprocessor expects by default. So, you need to adjust the regular expression to find the files that you are trimming. You do this by running illumiprocessor with the --r1-pattern
and --r2-pattern
like:
illumiprocessor \
--input raw-fastq \
--output clean-fastq \
--config test.conf \
--r1-pattern "{}_(R1|READ1|Read1|read1).fastq(?:.gz)*" \
--r2-pattern "{}_(R2|READ2|Read2|read2).fastq(?:.gz)*"
This will find files in raw-fastq
that end in .fastq
or .fastq.gz
, and it substitutes the names from the [names]
section to the left of the :
into the brackets {}
.
Thanks, Brant. My files end in _R1.fastq and _R2.fastq, so my command looks like this:
illumiprocessor \ --input raw-fastq \ --output clean-fastq \ --config test.conf \ --r1-pattern "{}_R1.fastq" \ --r2-pattern "{}_R2.fastq"
Is it correct?
Assuming your files remain named like your example files, either what I sent or what you suggest should work (what I sent makes several items "optional" - so it looks more complicated but matches your example name just fine).
This is really weird, because I keep getting the same error. :( I tried it with your command and with mine, and neither of them worked out.
illumiprocessor \
--input raw-fastq \ --output clean-fastq \ --config test.txt \ --r1-pattern "{}(R1|READ1|Read1|read1).fastq(?:.gz)*" \ --r2-pattern "{}(R2|READ2|Read2|read2).fastq(?:.gz)" [WARNING] Output directory exists, REMOVE [Y/n]? "Y" 2022-10-27 15:19:44,917 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Version: 2.10 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --config: test.txt 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --cores: 1 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --input: /home/aline/UCEs_Mari/test/raw-fastq 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --log_path: None 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --output: /home/aline/UCEs_Mari/test/clean-fastq 2022-10-27 15:19:44,918 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-27 15:19:44,919 - illumiprocessor - INFO - Argument --r1pattern: {}(R1|READ1|Read1|read1).fastq(?:.gz) 2022-10-27 15:19:44,919 - illumiprocessor - INFO - Argument --r2pattern: {}(R2|READ2|Read2|read2).fastq(?:.gz)* 2022-10-27 15:19:44,919 - illumiprocessor - INFO - Argument --se: False 2022-10-27 15:19:44,920 - illumiprocessor - INFO - Argument --trimmomatic: /home/aline/.conda/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-27 15:19:44,920 - illumiprocessor - INFO - Argument --verbosity: INFO Traceback (most recent call last): File "/home/aline/.conda/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in
sys.exit(main()) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 34, in main reads.append(core.SequenceData(args, conf, start_name, end_name)) File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 85, in init self._get_read_data() File "/home/aline/.conda/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 106, in _get_read_data "errors in your conf file.".format(self.start_name) OSError: There is a problem with the read names for SULA1. Ensure you do not have spelling/capitalization errors in your conf file.
I'm not sure what else to try - I cannot repeat the error you are seeing on my machines. Your best best may be to trim files one-by-one (using trimmomatic or whatever you prefer) and naming the resulting files in a way that phyluce expects.
Brant, I found it, it was my mistake. There was really a capitalization error. However, now I am getting this one:
[WARNING] Output directory exists, REMOVE [Y/n]? "Y"
2022-10-27 15:32:57,149 - illumiprocessor - INFO - ==================== Starting illumiprocessor ===================
2022-10-27 15:32:57,149 - illumiprocessor - INFO - Version: 2.10
2022-10-27 15:32:57,149 - illumiprocessor - INFO - Argument --config: test.txt
2022-10-27 15:32:57,149 - illumiprocessor - INFO - Argument --cores: 1
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --input: /home/aline/UCEs_Mari/test/raw-fastq
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --log_path: None
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --min_len: 40
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --no_merge: False
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --output: /home/aline/UCEs_Mari/test/clean-fastq
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --phred: phred33
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --r1_pattern: {}_R1.fastq
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --r2_pattern: {}_R2.fastq
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --se: False
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --trimmomatic: /home/aline/.conda/envs/phyluce-1.7.1/bin/trimmomatic
2022-10-27 15:32:57,150 - illumiprocessor - INFO - Argument --verbosity: INFO
2022-10-27 15:32:57,152 - illumiprocessor - INFO - Trimming samples with Trimmomatic
RunningTraceback (most recent call last):
File "/home/aline/.conda/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in
Again, I am not seeing this error or able to repeat it. I think you should try to move ahead in another way.
Thanks for all your help and patience, Brant. The best part is that I just have managed it to work out and I don't have any idea of what I have done to correct it - I've only worked with the .gz files instead of working with the unzipped files, and followed my last example of command, just adding the .gz in the end. Don't know if I should be happy or sad, but at least it worked! If you have any idea of what could have happened, I would be glad to hear (read) it. Thanks again and cheers, Hope I can go through the next steps without coming back here to ask you more things! :)
Cool 👍
I am having some issues when trying to use the pipeline for Illumiprocessor, and would be very grateful if one of you is able to help me to found it out. I am working in a MacOS Monterey 12.6, and used some example files from Rapid Genomics and my .conf file looks like this:
[adapters]
i7 = GATCGGAAGAGCACACGTCTGAACTCCAGTCACBCBCBCBCATCTCGTATGCCGTCTTCTGCTTG i5 = AATGATACGGCGACCACCGAGATCTACACBCBCBCBCACACTCTTTCCCTACACGACGCTCTTCCGATCT
[tag sequences]
i5P01WG01 = TCTACTCT i7P01WG01 = CACCTTAC
[tag map]
RAPiDGenomicsP01WG01 = i5P01WG01,i7P01WG01
[names]
RAPiDGenomicsP01WG01 = apeteJLB07
When I try to run "illumiprocessor --input 1_raw-fastq --output 2_clean-fastq --config illumiprocessor.conf --cores 4", I get the following error (I am pasting the complete error):
"2022-10-17 16:25:46,835 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2022-10-17 16:25:46,835 - illumiprocessor - INFO - Version: 2.10 2022-10-17 16:25:46,835 - illumiprocessor - INFO - Argument --config: illumiprocessor.conf 2022-10-17 16:25:46,835 - illumiprocessor - INFO - Argument --cores: 4 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --input: /Users/marimazzochi/Desktop/1_raw-fastq 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --log_path: None 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --min_len: 40 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --no_merge: False 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --output: /Users/marimazzochi/Desktop/2_clean-fastq 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --phred: phred33 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --r1_pattern: _R1 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --r2_pattern: _R2 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --se: False 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --trimmomatic: /Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/trimmomatic 2022-10-17 16:25:46,836 - illumiprocessor - INFO - Argument --verbosity: INFO Traceback (most recent call last): File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in sys.exit(main()) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main main(args) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 29, in main conf.read(args.config) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/configparser.py", line 697, in read self._read(fp, filename) File "/Users/marimazzochi/opt/miniconda3/envs/phyluce-1.7.1/lib/python3.6/configparser.py", line 1080, in _read raise MissingSectionHeaderError(fpname, lineno, line) configparser.MissingSectionHeaderError: File contains no section headers."
In order to make it work, I tried some things that I found on StackOverFlow and on GitHub, but nothing worked out: 1 - replacing "-" for "*" in the adapters section because some people that use MacOS said that this caused troubles for them; 2 - replaced ":" for "="; 3 - changed UTF-8 for ASCII when saving my .conf file. I suspect that this issue may be related to formatting of .conf file. I am not able to save it as .conf, only as .rtf on TextEdit or as .csv on Numbers, and then I am manually changing it to .conf on Terminal (using the command 'mv'). That said, do you have any suggestion about the error or maybe about a different way of creating a .conf file?
Cheers, Mariana Mazzochi