faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

Use phyluce_assembly_get_trinity_coverage for abyss data? #154

Closed ymilesz closed 3 years ago

ymilesz commented 5 years ago

Hi Brant,

Is it possible to get coverage data using phyluce_assembly_get_trinity_coverage.py, I assumed yes since --assembler allows you to select the program, but I am getting the following error that seems different than all the other ones posted so far:

2019-04-04 19:35:44,922 - phyluce_assembly_get_trinity_coverage - INFO - ========= Starting phyluce_assembly_get_trinity_coverage ======== 2019-04-04 19:35:44,924 - phyluce_assembly_get_trinity_coverage - INFO - Version: git fatal: Not a git repository: '/apps/phyluce/20190308/lib/python2.7/site-packages/.git' 2019-04-04 19:35:44,924 - phyluce_assembly_get_trinity_coverage - INFO - Argument --assembler: abyss 2019-04-04 19:35:44,924 - phyluce_assembly_get_trinity_coverage - INFO - Argument --assemblies: /ufrc/lucky/yuanmeng.zhang/Nylanderia/abyss35-assemblies 2019-04-04 19:35:44,924 - phyluce_assembly_get_trinity_coverage - INFO - Argument --assemblo_config: /ufrc/lucky/yuanmeng.zhang/Nylanderia/assembly.conf 2019-04-04 19:35:44,924 - phyluce_assembly_get_trinity_coverage - INFO - Argument --bwa_mem: False 2019-04-04 19:35:44,924 - phyluce_assembly_get_trinity_coverage - INFO - Argument --clean: False 2019-04-04 19:35:44,924 - phyluce_assembly_get_trinity_coverage - INFO - Argument --cores: 4 2019-04-04 19:35:44,925 - phyluce_assembly_get_trinity_coverage - INFO - Argument --log_path: /ufrc/lucky/yuanmeng.zhang/Nylanderia/log 2019-04-04 19:35:44,925 - phyluce_assembly_get_trinity_coverage - INFO - Argument --subfolder: 2019-04-04 19:35:44,925 - phyluce_assembly_get_trinity_coverage - INFO - Argument --trim: False 2019-04-04 19:35:44,925 - phyluce_assembly_get_trinity_coverage - INFO - Argument --verbosity: INFO 2019-04-04 19:35:44,925 - phyluce_assembly_get_trinity_coverage - INFO - Getting input filenames 2019-04-04 19:35:44,965 - phyluce_assembly_get_trinity_coverage - INFO - ------------------- Processing NylaM01_bibadia ------------------ 2019-04-04 19:35:44,966 - phyluce_assembly_get_trinity_coverage - INFO - Finding fastq/fasta files 2019-04-04 19:35:44,971 - phyluce_assembly_get_trinity_coverage - INFO - File type is fastq 2019-04-04 19:35:44,972 - phyluce_assembly_get_trinity_coverage - INFO - Running bwa indexing against /ufrc/lucky/yuanmeng.zhang/Nylanderia/abyss35-assemblies/NylaM01_bibadia/contigs.fasta 2019-04-04 19:36:05,564 - phyluce_assembly_get_trinity_coverage - INFO - Indexing fasta for NylaM01_bibadia 2019-04-04 19:36:05,938 - phyluce_assembly_get_trinity_coverage - INFO - Creating FASTA dict for NylaM01_bibadia 2019-04-04 19:36:06,841 - phyluce_assembly_get_trinity_coverage - INFO - Creating read index file for NylaM01_bibadia-READ1.fastq.gz 2019-04-04 19:36:41,604 - phyluce_assembly_get_trinity_coverage - INFO - Creating read index file for NylaM01_bibadia-READ2.fastq.gz 2019-04-04 19:37:16,378 - phyluce_assembly_get_trinity_coverage - INFO - Building BAM for NylaM01_bibadia 2019-04-04 19:37:53,477 - phyluce_assembly_get_trinity_coverage - INFO - Cleaning BAM for NylaM01_bibadia 2019-04-04 19:38:16,786 - phyluce_assembly_get_trinity_coverage - INFO - Adding RG header to BAM for NylaM01_bibadia 2019-04-04 19:38:56,422 - phyluce_assembly_get_trinity_coverage - INFO - Creating read index file for NylaM01_bibadia-READ-singleton.fastq.gz 2019-04-04 19:38:57,139 - phyluce_assembly_get_trinity_coverage - INFO - Building BAM for NylaM01_bibadia 2019-04-04 19:38:57,943 - phyluce_assembly_get_trinity_coverage - INFO - Cleaning BAM for NylaM01_bibadia 2019-04-04 19:39:00,810 - phyluce_assembly_get_trinity_coverage - INFO - Adding RG header to BAM for NylaM01_bibadia 2019-04-04 19:39:04,631 - phyluce_assembly_get_trinity_coverage - INFO - Merging BAMs for NylaM01_bibadia 2019-04-04 19:39:25,832 - phyluce_assembly_get_trinity_coverage - INFO - Marking read duplicates from BAM for NylaM01_bibadia 2019-04-04 19:40:04,749 - phyluce_assembly_get_trinity_coverage - INFO - Indexing BAM for NylaM01_bibadia 2019-04-04 19:40:06,621 - phyluce_assembly_get_trinity_coverage - INFO - Computing coverage with GATK for NylaM01_bibadia 2019-04-04 19:40:06,676 - phyluce_assembly_get_trinity_coverage - INFO - Screening contigs for coverage Traceback (most recent call last): File "/apps/phyluce/20190308/bin/phyluce_assembly_get_trinity_coverage", line 238, in main() File "/apps/phyluce/20190308/bin/phyluce_assembly_get_trinity_coverage", line 228, in main overall_contigs = gatk.get_untrimmed_coverage_from_output(log, sample, assembly_pth, coverage, args.assembler) File "/apps/phyluce/20190308/lib/python2.7/site-packages/phyluce/gatk.py", line 216, in get_untrimmed_coverage_from_output with open(coverage, 'rU') as infile: IOError: [Errno 2] No such file or directory: '/ufrc/lucky/yuanmeng.zhang/Nylanderia/abyss35-assemblies/NylaM01_bibadia/NylaM01_bibadia-coverage'

shahanderkarabetian commented 5 years ago

Hi Brant and Miles,

I am getting the exact same error, except I am doing it with trinity assemblies. Was this ever figured out?

Thanks.

brantfaircloth commented 5 years ago

Does the directory into which this file is being written already exist? Are all of the paths correct? Basically the error is saying that this file cannot be opened to be read... so the file is not being created before being read.

shahanderkarabetian commented 5 years ago

Yeah, paths are correct. It's writing the .GATK-coverage-out.log file. Although looking in the log file now, it just shows a GATK runtime error, so I'll try to figure that out. Thanks for the quick response Brant.

brantfaircloth commented 5 years ago

It may be a GATK-version-related thing, particularly if you are using (or getting from bioconda) a new-ish version of GATK. You could try to install an older version and/or GATK-lite to see if that works.

shahanderkarabetian commented 5 years ago

So, I've installed phyluce 1.6 (I was using 1.5), downloaded and registered gatk 3.8. Now I get GATK jar file not found. Have you run "gatk-register"?. I've registered it several times now, the files are there in$CONDA/bin and $CONDA/jar. I have to use gatk3-register as opposed to gatk-register, otherwise I always get The version of the jar specified, 3.5, does not match the version expected by conda: 3.8 even though the versions actually match as downloaded from Broad Institute.

If I edit the .phyluce.conf file to direct it to the gatk3.py or GenomeAnalysisTK.jar, it just doesn't write anything in the logfile and I get this error:

2019-07-10 11:34:28,516 - phyluce_assembly_get_trinity_coverage - INFO - Marking read duplicates from BAM for Acumontia_sp_OP4295 2019-07-10 11:34:29,368 - phyluce_assembly_get_trinity_coverage - INFO - Indexing BAM for Acumontia_sp_OP4295 2019-07-10 11:34:29,377 - phyluce_assembly_get_trinity_coverage - INFO - Computing coverage with GATK for Acumontia_sp_OP4295 Traceback (most recent call last): File "/Users/shahanderkarabetian/miniconda2/envs/phyluce/bin/phyluce_assembly_get_trinity_coverage", line 238, in main() File "/Users/shahanderkarabetian/miniconda2/envs/phyluce/bin/phyluce_assembly_get_trinity_coverage", line 222, in main coverage = gatk.coverage(log, sample, assembly_pth, assembly, args.cores, bam) File "/Users/shahanderkarabetian/miniconda2/envs/phyluce/lib/python2.7/site-packages/phyluce/gatk.py", line 50, in coverage proc = subprocess.Popen(cmd, stdout=gatk_out, stderr=subprocess.STDOUT) File "/Users/shahanderkarabetian/miniconda2/envs/phyluce/lib/python2.7/subprocess.py", line 394, in init errread, errwrite) File "/Users/shahanderkarabetian/miniconda2/envs/phyluce/lib/python2.7/subprocess.py", line 1047, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory

I'm at a loss.

brantfaircloth commented 5 years ago

The way conda works, gatk should end up at a path like /Users/bcf/Anaconda/envs/phyluce/bin/gatk, so that should be the path you use in ~/.phyluce.conf

shahanderkarabetian commented 5 years ago

Okay, turns out the problem wasn't gatk.... it was the shortcut to the contigs.fasta files I made. I was starting with only Trinity fasta files and not the output folder structure you get when running Trinity through phyluce. (hangs head in shame) Thanks for the help though, Brant.

brantfaircloth commented 5 years ago

ha - no sweat!

ymilesz commented 5 years ago

So where should the assembly.conf file be pointed to? Currently I have mine at ./clean-fastq/sample name/split-adapter-quality-trimmed/, which has read1.fastq.gz, read2.fastq.gz, and singleton.fastq.gz

brantfaircloth commented 5 years ago

that looks like relative path. the paths should be absolute.