faircloth-lab / illumiprocessor

pre-process illumina reads
http://illumiprocessor.readthedocs.org/
Other
18 stars 10 forks source link

Problem with the ConfigParser #5

Open lsbarrientos50 opened 9 years ago

lsbarrientos50 commented 9 years ago

I have problems with the ConfigParser.py it could't find the section "names" in my config file. I double check that all the names will be ok and the path is ok. I try to find if the ConfigParser.py have some problem but I can't find one. I checked if is a problem in the installation of illumiprocessor or the trimmomatic or java and they look ok. I try I reinstall everything and the problem still be there.

captura_de_pantalla_2_16_15 _1_55_pm

akijarl commented 7 years ago

I'm having this exact same problem now. Was there ever a clear solution to this?

GideonPisanty commented 6 years ago

Same problem... if anyone has a solution please post it here. Thanks.

brantfaircloth commented 6 years ago

Can someone upload their config file, please? Also, it may be easiest to diagnose the config file problem by shortening it to contain only 1-2 samples in order to see if the problem is in the naming approach being used.

GideonPisanty commented 6 years ago

I have tried several different versions of the config file and none worked. This is the latest one:

[adapters] i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGTGTAGATCTCGGTGGTCGCCGTATCATT [tag sequences] i7-112_06:CCGGAATT i5-19_F:TGGCTCTT [tag map] aegyptiaca_S34:i7-112_06,i5-19_F [names] aegyptiaca_S34:aegyptiaca

image

brantfaircloth commented 6 years ago

The following config file appears to work fine for me:

[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences]
i7-112_06:CCGGAATT
i5-19_F:TGGCTCTT

[tag map]
aegyptiaca_S34:i7-112_06,i5-19_F

[names]
aegyptiaca_S34:aegyptiaca

when the input are two files in a directory named test, as below:

aegyptiaca_S34_T3_R1_001.fastq.gz
aegyptiaca_S34_T3_R2_001.fastq.gz

These are not your data, but two files made to look like your data. I formed the names so they would be like what illumiprocessor expects. The tree of the input and output folders looks like this:

.
├── test
│   ├── aegyptiaca_S34_T3_R1_001.fastq.gz
│   └── aegyptiaca_S34_T3_R2_001.fastq.gz
├── clean
│   └── aegyptiaca
│       ├── adapters.fasta
│       ├── raw-reads
│       │   ├── aegyptiaca-READ1.fastq.gz -> /home/bcf/tmp/test/test/aegyptiaca_S34_T3_R1_001.fastq.gz
│       │   └── aegyptiaca-READ2.fastq.gz -> /home/bcf/tmp/test/test/aegyptiaca_S34_T3_R2_001.fastq.gz
│       ├── split-adapter-quality-trimmed
│       │   ├── aegyptiaca-READ1.fastq.gz
│       │   ├── aegyptiaca-READ2.fastq.gz
│       │   └── aegyptiaca-READ-singleton.fastq.gz
│       └── stats
│           └── aegyptiaca-adapter-contam.txt
├── illumiprocessor.log
└── test.conf
GideonPisanty commented 6 years ago

Thanks for the answer. For now, I bypassed the problem by running trimmomatic directly. Haven't yet solved this issue.

GideonPisanty commented 6 years ago

The problem seems to be solved by including the name of the config file in the "--config" line of the command. Thanks to my colleague Jackson Eyres for solving this.

However, now that it is solved, I have the other issue with the read names, frequently encountered by other users here:

"errors in your conf file.".format(self.start_name)) IOError: There is a problem with the read names for morelia-viridis1_GCCTTCA. Ensure you do not have spelling/capitalization errors in your conf file."

I have tried all the solutions mentioned in the threads here, and renamed my files in numerous different ways, but none worked for me.

brantfaircloth commented 6 years ago

The illumiprocessor --help command indicates that the --config file needs to be passed to the program. Similarly, the example code that I sent you also includes the config file in the program invocation, following --config, so it should not be a surprise that the file name is needed.

The second issue you are encountering is frequently encountered because users enter their file names incorrectly or because their fastq files have a naming format that is different from what is expected by the program. Basically, illumiprocessor is having a hard time finding the correct read pairs that go with the name that you have provided for morelia-viridis1_GCCTTCA. This is either because the read files have slightly different names from what you entered or because the format of the read files is different than what illumiprocessor expects (the test data you sent me worked fine, so may not be format). You'll need to figure out which of those is causing the problem. If you need to change the general naming format for which illumiprocessor is searching, that requires changing the regular expressions that are used to find read files. By default, those regular expressions are:

r1_pattern = "{}_(?:.*)_(R1|READ1|Read1|read1)_\d+.fastq(?:.gz)*"
r2_pattern = "{}_(?:.*)_(R2|READ2|Read2|read2)_\d+.fastq(?:.gz)*"

In this case, the name morelia-viridis1_GCCTTCA is substituted where the squiggly braces ({}) are in the example above, then the regular expression is constructed. You can set that to whatever is needed to find your R1 and R2 files. Because R1 and R2 come from Illumina sequencers (or sequencing providers) in all sorts of naming combinations, you're able to adjust what the program looks for (the --r1-pattern and --r2-pattern options).

GideonPisanty commented 6 years ago

Thanks.

GideonPisanty commented 6 years ago

Now it's working, not sure why. I might have mixed underlines with hyphens in the species names.

zhangpizhu commented 6 years ago

I also have a problem like this, I don't know what does means. Is there some error with my config file? if anyone konw how to slove this problem,
please post it here.thanks

my config file as below:

[adapters] i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences] i7-7893:GAATCTGT i5-7893:ATAGCGAC

[tag map] sl337893:i7-7893,i5-7893

[names] sl337893:sl337893

The names of input file as below:

sl337893_T1_R1_7893.fastq.gz sl337893_T1_R2_7893.fastq.gz 2018-11-12 21_32_41____________

brantfaircloth commented 6 years ago

See above. You'll likely need to adjust either your config file or the regular expression to deal with your file names.

zhangpizhu commented 5 years ago

I've dealt with my problems. Thank you very much.

mchj74 commented 3 years ago

Hi Brant, I am having issues with config file when try to run illumiprocessor:

for a single-indexed library, i7 is my index primer sequence and i5 is the NEBNEXT universal primer sequence. Two files of BRF27_POM134_S27_R1_001.fastq.gz and BRF27_POM134_S27_R2_001.fastq.gz are inside a folder POM134

The script I am using is: module load phyluce/1.6.8 module load illumiprocessor/2.0.9 illumiprocessor --input POM134 --output clean --config illumiprocessor.conf --cores 4 --trimmomatic ${TRIMMOMATIC_HOME}/bin/trimmomatic

The config file is as below:

[adapters] i7:CAAGCAGAAGACGGCATACGAGAT*GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT i5:AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

[tag sequences] i7-P1-A1:TTACCGAC

[tag map] BRF27_POM134_S27:i7-P1-A1

[names] BRF27_POM134_S27:C.sp

I am getting this error many times: "errors in your conf file.".format(self.start_name)) IOError: There is a problem with the read names for BRF27_POM134_S27. Ensure you do not have spelling/capitalization errors in your conf file. Can you help how I can fix it?

brantfaircloth commented 3 years ago

It looks like your reads are named in a way that is not expected by phyluce. This means that you need to change the regular expression used to match the read data given the read names. The following should do it, although I have not tested:

--r1_pattern "{}_(R1)_\d+.fastq(?:.gz)*"
--r2_pattern "{}_(R2)_\d+.fastq(?:.gz)*"

Also, when you rename your reads in the [names] section, you need to use something that does not contain most symbols (underscore "_" is ok). For example:

[names]
BRF27_POM134_S27:C_sp
mchj74 commented 3 years ago

Not sorted