BigelowLab / viruscope

Identify viral sequences in single amplified genomes
1 stars 0 forks source link

tetramer output name in signals.cfg #4

Closed superjess81 closed 8 years ago

superjess81 commented 8 years ago

The name of the tetramerPC output is not zipped in the signals.cfg file, making graphsignals crash unless the file is unzipped, or the name is modified in the configuration file.

btupper commented 8 years ago

I need to see a complete example of the error generated. Perhaps you point me to an example data set and command string that fails; that way I can reproduce the error to diagnose directly.

superjess81 commented 8 years ago

Here is an example:

Rscript --vanilla /mnt/scgc_nfs/opt/viralscan/graphsignals.Rscript /mnt/stepanauskas_nfs/jlabonte/WGA-X_manuscript/Output_AD_867/AD-867-A04/signals.cfg ERROR [2016-03-30 09:42:47] file for /mnt/stepanauskas_nfs/jlabonte/WGA-X_manuscript/Output_AD_867/AD-867-A04/tetramerPCA/AD-867-A04-tetramer-PC.csv not found Error in file.exists(TMPDIR) : object 'TMPDIR' not found Calls: get_file_from_config -> viralsignals_quit -> file.exists 3: file.exists(TMPDIR) 2: viralsignals_quit(status = 1) 1: get_file_from_config("tetramer_file", relative = relative_path, path = INPUT_PATH) INFO [2016-03-30 09:42:47] reading FASTA file: /mnt/stepanauskas_nfs/jlabonte/WGA-X_manuscript/AD-867_2000/AD-867-A04_all_contigs.fasta INFO [2016-03-30 09:42:47] read_fasta: AD-867-A04_all_contigs.fasta INFO [2016-03-30 09:42:48] reading blastp file: AD-867-A04_blastp.tsv.gz INFO [2016-03-30 09:42:48] flagging viral genes in blastp results INFO [2016-03-30 09:42:48] flagging viral2 genes in blastp results INFO [2016-03-30 09:42:48] flagging hypothetical viral genes in blastp results INFO [2016-03-30 09:42:48] reading proteins FASTA file: AD-867-A04_proteins.fasta INFO [2016-03-30 09:42:48] read_fasta: AD-867-A04_proteins.fasta INFO [2016-03-30 09:42:48] processing similarity INFO [2016-03-30 09:42:48] read_similarity: LineP-all.tsv.gz INFO [2016-03-30 09:42:50] selecting best hit based upon: qseqid length bitscore INFO [2016-03-30 09:42:51] read_similarity: POV.tsv.gz INFO [2016-03-30 09:42:51] selecting best hit based upon: qseqid length bitscore INFO [2016-03-30 09:42:54] process_similarity with provided data.table INFO [2016-03-30 09:42:57] process_similarity with provided data.table INFO [2016-03-30 09:42:57] processing pileups INFO [2016-03-30 09:42:57] identifying empty pileups INFO [2016-03-30 09:42:57] reading pileups INFO [2016-03-30 09:42:57] read_pileup: LineP-all.tsv.gz INFO [2016-03-30 09:42:58] read_pileup: POV.tsv.gz INFO [2016-03-30 09:42:59] summarizing pileups Error in basename(TETRAMER_FILE) : object 'TETRAMER_FILE' not found Calls: flog.info -> .log_level -> layout -> basename 4: basename(TETRAMER_FILE) 3: layout(level, msg, ...) 2: .log_level(msg, ..., level = INFO, name = name, capture = capture) 1: flog.info("processing tetramer: %s", basename(TETRAMER_FILE)) Error in read.table(file = file, header = header, sep = sep, quote = quote, : object 'TETRAMER_FILE' not found ERROR [2016-03-30 09:43:01] error processing tetramer file

superjess81 commented 8 years ago

We either have to not zip the output of tetramer, or change the tetramer name in the signal.cfg file.

btupper commented 8 years ago

This is a deeply nested issue - I have implemented a quick solution to look for the filename and if not found try looking for filename.gz In theory the file can be listed in the config file either way.

btupper@charlie ~ $ Rscript --vanilla /mnt/scgc_nfs/opt/viralscan/graphsignals.Rscript /mnt/stepanauskas_nfs/jlabonte/WGA-X_manuscript/Output_AD_867/AD-867-A04/signals.cfg
INFO [2016-03-30 10:11:47] reading FASTA file: /mnt/stepanauskas_nfs/jlabonte/WGA-X_manuscript/AD-867_2000/AD-867-A04_all_contigs.fasta
INFO [2016-03-30 10:11:47] read_fasta: AD-867-A04_all_contigs.fasta
INFO [2016-03-30 10:11:47] reading blastp file: AD-867-A04_blastp.tsv.gz
INFO [2016-03-30 10:11:47] flagging viral genes in blastp results
INFO [2016-03-30 10:11:47] flagging viral2 genes in blastp results
INFO [2016-03-30 10:11:47] flagging hypothetical viral genes in blastp results
INFO [2016-03-30 10:11:47] reading proteins FASTA file: AD-867-A04_proteins.fasta
INFO [2016-03-30 10:11:47] read_fasta: AD-867-A04_proteins.fasta
INFO [2016-03-30 10:11:47] processing similarity
INFO [2016-03-30 10:11:47] read_similarity: LineP-all.tsv.gz
INFO [2016-03-30 10:11:49] selecting best hit based upon: qseqid length bitscore
INFO [2016-03-30 10:11:50] read_similarity: POV.tsv.gz
INFO [2016-03-30 10:11:51] selecting best hit based upon: qseqid length bitscore
INFO [2016-03-30 10:11:53] process_similarity with provided data.table
INFO [2016-03-30 10:11:55] process_similarity with provided data.table
INFO [2016-03-30 10:11:55] processing pileups
INFO [2016-03-30 10:11:55] identifying empty pileups
INFO [2016-03-30 10:11:55] reading pileups
INFO [2016-03-30 10:11:55] read_pileup: LineP-all.tsv.gz
INFO [2016-03-30 10:11:56] read_pileup: POV.tsv.gz
INFO [2016-03-30 10:11:57] summarizing pileups
INFO [2016-03-30 10:11:58] processing tetramer: AD-867-A04-tetramer-PC.csv.gz
INFO [2016-03-30 10:11:58] processing tRNA scan file: AD-867-A04-tRNAscan.txt
INFO [2016-03-30 10:11:58] processing proteins for viral genes
INFO [2016-03-30 10:11:58]     no viral genes found
INFO [2016-03-30 10:11:58] processing proteins for viral2 genes
INFO [2016-03-30 10:11:58] processing proteins for hypothetical viral genes
INFO [2016-03-30 10:11:58] processing GC content: AD-867-A04_gc_content.tsv.gz
INFO [2016-03-30 10:11:58] read_gccontent: AD-867-A04_gc_content.tsv.gz
INFO [2016-03-30 10:11:59] saving summary table
Loading required package: class
INFO [2016-03-30 10:12:00] making graphics
INFO [2016-03-30 10:12:00] writing /mnt/stepanauskas_nfs/jlabonte/WGA-X_manuscript/Output_AD_867/AD-867-A04/summary/AD-867-A04.pdf
INFO [2016-03-30 10:12:01] done!
btupper commented 8 years ago

I should point out that currently this fix exists only in the devel branch (which is currently the exposed version.)