faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
80 stars 49 forks source link

Adding Barcode+Adapter Removed SRA Sequences into Phyluce #234

Open alexkrohn opened 3 years ago

alexkrohn commented 3 years ago

Hi there. I have a dataset of demultiplexed FASTQ files that I generated that I would love to combine with data pulled from the SRA. The SRA data are demultiplexed and have their barcodes+adapters already trimmed off. However, like my data, the SRA raw reads still need to be trimmed for quality. I would love to use illumiprocessor to do the QC and trimming so that I can integrate all the reads together into the Phyluce pipeline.

How can I configure the illumiprocessor.conf file to not bother looking for adapters+barcodes in some of the files?

Let's say my files are named: old-reads_IND001_R1_001.fastq.gz old-reads_IND001_R2_001.fastq.gz new-reads_IND002_R1_001.fastq.gz new-reads_IND002_R2_001.fastq.gz

I've tried:

[adapters]
....

[tag sequences]
....

[tag map]
IND001:i5_17_G,i7_105_01

[names]
IND001:IND001
IND002:IND002

and

[adapters]
....

[tag sequences]
....

[tag map]
IND001:i5_17_G,i7_105_01
IND002:

[names]
IND001:IND001
IND002:IND002

But neither seem to work. Any suggestions?

brantfaircloth commented 3 years ago

You can’t really do it that way. Your best bet is to use some dummy sequences for the indexes. The quality trimming will proceed, as well. Alternatively, just trim them for quality externally, get them in the expected format and assemble using phyluce tools.

alexkrohn commented 3 years ago

Got it. I just pasted one of the random tags onto the end of the individuals. I assume illumiprocessor will search for the tag, not find it, and move on to trimming, as you say.

Unfortunately, I got this error, which I don't quite understand. I figure the file should exist in order for the pipeline to work on it :-D

Does this have to do with the indexes, or is it something else entirely

Traceback (most recent call last):
  File "/home/tangled/tbc/compute/alex_compute/miniconda2/envs/phyluce-1.7.1/bin/illumiprocessor", line 17, in <module>
    sys.exit(main())
  File "/home/tangled/tbc/compute/alex_compute/miniconda2/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/cli/main.py", line 114, in main
    main(args)
  File "/home/tangled/tbc/compute/alex_compute/miniconda2/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/main.py", line 36, in main
    core.create_new_dirs(reads)
  File "/home/tangled/tbc/compute/alex_compute/miniconda2/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 337, in create_new_dirs
    os.symlink(reads, new_file)
FileExistsError: [Errno 17] File exists: '/home/tangled/tbc/compute/UCE/PIME/30-520350235_060421/raw-fastq/P.m.mg_Hernando_FL_S20_R1_001.fastq.gz' -> '/home/tangled/tbc/compute/alex_compute/UCEs/pime/combined-data/clean-fastq/S2/raw-reads/S2-READ1.fastq.gz'
brantfaircloth commented 3 years ago

for some reason it thinks the file already exists - you might have a duplicate file name somewhere in there.

alexkrohn commented 3 years ago

Ahhh. I see now. It looks like it's conflating individual S20 with S2. I will correct the file names to be S20 and S02 to hopefully avoid that in the future.