BosingerLab / BALDR

MIT License
7 stars 8 forks source link

baldr not recognizing inputs in .gz #4

Closed ibseq closed 5 years ago

ibseq commented 5 years ago

Hi there I'm gettign this error when running baldr: ERROR: Missing input files. Please enter the compressed fastq.gz file/s. Please note the file names should end in ".fastq.gz". For paired end, the files must be separated only by a comma.

command line: ./BALDR --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz, ../IB-031218-1_S1_L001_R2_001.fastq.gz

any advice? thanks ibseq

amit-upadhyay commented 5 years ago

I think you have a space between your file names. --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz, ../IB-031218-1_S1_L001_R2_001.fastq.gz

Can you please try after removing the space: ./BALDR --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,../IB-031218-1_S1_L001_R2_001.fastq.gz

On Mon, Dec 17, 2018 at 8:37 AM ibseq notifications@github.com wrote:

Hi there I'm gettign this error when running baldr: ERROR: Missing input files. Please enter the compressed fastq.gz file/s. Please note the file names should end in ".fastq.gz". For paired end, the files must be separated only by a comma.

command line: ./BALDR --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz, ../IB-031218-1_S1_L001_R2_001.fastq.gz

any advice? thanks ibseq

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQOmy94WtVZEGoY_4iv3YLZgjTcQ9Zbks5u554QgaJpZM4ZWXp- .

ibseq commented 5 years ago

i got this: and no outputs (BALDR) ibassano@login-2-internal:~/WORK/BALDR$ less baldr.pbs.e2225366 (BALDR) ibassano@login-2-internal:~/WORK/BALDR$ less baldr.pbs.e2225490

Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-86/fasta/homo_sapiens/dna ... done. ==> SIZE Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... 881214448 ==> PASV ... done. ==> RETR Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz ... done. /work/ibassano/packages/BALDR-master/BALDR/resources/STAR_genomes/human: Not a directory/work/ibassano/packages/BALDR-master/BALDR/resources/STAR_genomes/human/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz: Not a directory gzip: /work/ibassano/packages/BALDR-master/BALDR/resources/STAR_genomes/human/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz: Not a directory wget: /rds/general/user/ibassano/home/anaconda3/lib/libuuid.so.1: no version information available (required by wget) --2018-12-17 15:24:00-- ftp://ftp.ensembl.org/pub/release-86/gtf/homo_sapiens/Homo_sapiens.GRCh38.86.gtf.gz => ‘/work/ibassano/packages/BALDR-master/BALDR/resources/STAR_genomes/human/Homo_sapiens.GRCh38.86.gtf.gz’ Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.8 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.8|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-86/gtf/homo_sapiens ... done. ==> SIZE Homo_sapiens.GRCh38.86.gtf.gz ... 45758566 ==> PASV ... done. ==> RETR Homo_sapiens.GRCh38.86.gtf.gz ... done. /work/ibassano/packages/BALDR-master/BALDR/resources/STAR_genomes/human: Not a directory/work/ibassano/packages/BALDR-master/BALDR/resources/STAR_genomes/human/Homo_sapiens.GRCh38.86.gtf.gz: Not a directory gzip: /work/ibassano/packages/BALDR-master/BALDR/resources/STAR_genomes/human/Homo_sapiens.GRCh38.86.gtf.gz: Not a directory

genomeGenerate.cpp:150:genomeGenerate: exiting because of OUTPUT FILE error: could not create output file /work/ibassano/packages/BALDR-master/BALDR/resources/STAR_genomes/human/GRCh38_primary_ensembl86/genomeParameters.txt Solution: check that the path exists and you have write permission for this file

Dec 17 15:24:00 ...... FATAL ERROR, exiting TrimmomaticPE: Started with arguments: -threads 1 -trimlog Trimmed/Log/_trim.log /work/ibassano/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz /work/ibassano/IB-031218-1_S1_L001_R2_001.fastq.gz Trimmed/_nexteratrim_1P.fastq.gz Trimmed/_nexteratrim_1U.fastq.gz Trimmed/_nexteratrim_2P.fastq.gz Trimmed/_nexteratrim_2U.fastq.gz ILLUMINACLIP:/home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa:2:30:10 Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA' ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 83857 Both Surviving: 83857 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%) TrimmomaticPE: Completed successfully

EXITING because of FATAL ERROR: could not open genome file /work/ibassano/packages/BALDR-master/BALDR/resources/STAR_genomes/human/GRCh38_primary_ensembl86/genomeParameters.txt SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions

Dec 17 15:24:12 ...... FATAL ERROR, exiting

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe samtools index: "STAR/.STAR.sorted.bam" is in a format that cannot be usefully indexed cat: /work/ibassano/packages/BALDR-master/BALDR/resources/IG_loci/human/IG_loci_human.txt: Not a directory Can't exec "/work/ibassano/packages/BALDR-master/BALDR/lib/parse_igblast.pl": Not a directory at /work/ibassano/packages/BALDR-master/BALDR line 662. Error: could not open IG-mapped_Unmapped/Quantification/full/.IG-mapped_Unmapped.full.fa Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build --wrapper basic-0 IG-mapped_Unmapped/Quantification/full/.IG-mapped_Unmapped.full.fa IG-mapped_Unmapped/Quantification/full/index/.IG-mapped_Unmapped sh: IG-mapped_Unmapped/Quantification/full/counts/.IG-mapped_Unmapped_full_bt2_numreadsmapped: No such file or directory (ERR): mkfifo(/tmp/30625.inpipe1) failed. Exiting now ... Error: could not open IG-mapped_Unmapped/Quantification/VDJ/.IG-mapped_Unmapped.VDJ.fa Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build --wrapper basic-0 IG-mapped_Unmapped/Quantification/VDJ/.IG-mapped_Unmapped.VDJ.fa IG-mapped_Unmapped/Quantification/VDJ/index/.IG-mapped_Unmapped sh: IG-mapped_Unmapped/Quantification/VDJ/counts/.IG-mapped_Unmapped_VDJ_bt2_numreadsmapped: No such file or directory (ERR): mkfifo(/tmp/30633.inpipe1) failed. Exiting now ... Can't exec "/work/ibassano/packages/BALDR-master/BALDR/lib/add_quantification.pl": Not a directory at /work/ibassano/packages/BALDR-master/BALDR line 705. grep: IG-mapped_Unmapped/IgBLAST_quant/.IG-mapped_Unmapped.igblast_tabular.quant: No such file or directory grep: IG-mapped_Unmapped/IgBLAST_quant/.IG-mapped_Unmapped.igblast_tabular.quant: No such file or directory Can't exec "/work/ibassano/packages/BALDR-master/BALDR/lib/filter.pl": Not a directory at /work/ibassano/packages/BALDR-master/BALDR line 742. Can't exec "/work/ibassano/packages/BALDR-master/BALDR/lib/filter.pl": Not a directory at /work/ibassano/packages/BALDR-master/BALDR line 742. cp: target ‘/rds/general/user/ibassano/home/WORK/results’ is not a directory

command line used: /work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master/BALDR --paired /work/ibassano/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/IB-031218-1_S1_L001_R2_001.fastq.gz

On 17 Dec 2018, at 14:57, Amit Upadhyay notifications@github.com wrote:

I think you have a space between your file names. --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz, ../IB-031218-1_S1_L001_R2_001.fastq.gz

Can you please try after removing the space: ./BALDR --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,../IB-031218-1_S1_L001_R2_001.fastq.gz

On Mon, Dec 17, 2018 at 8:37 AM ibseq notifications@github.com wrote:

Hi there I'm gettign this error when running baldr: ERROR: Missing input files. Please enter the compressed fastq.gz file/s. Please note the file names should end in ".fastq.gz". For paired end, the files must be separated only by a comma.

command line: ./BALDR --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz, ../IB-031218-1_S1_L001_R2_001.fastq.gz

any advice? thanks ibseq

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQOmy94WtVZEGoY_4iv3YLZgjTcQ9Zbks5u554QgaJpZM4ZWXp- .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-447874095, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw2583OcUqao2JwRD53z9AtqpEUoNks5u57DngaJpZM4ZWXp-.

ibseq commented 5 years ago

and this : (BALDR) ibassano@login-2-internal:~/WORK/BALDR$ less baldr.pbs.o2225490

2018-12-17 15:24:13 Merging IG & Unmapped reads and writing to IG-mapped_Unmapped/IG-mapped_Unmapped_fastq 2018-12-17 15:24:13 Running Trinity assembly for --left IG-mapped_Unmapped/IG-mapped_Unmapped_fastq/_1P.IG_Unmapped.fastq.gz --right IG-mapped_Unmapped/IG-mapped_Unmapped_fastq/_2P.IG_Unmapped.fastq.gz. /usr/bin/time -v Trinity --seqType fq --full_cleanup --max_memory 32G --left IG-mapped_Unmapped/IG-mapped_Unmapped_fastq/_1P.IG_Unmapped.fastq.gz --right IG-mapped_Unmapped/IG-mapped_Unmapped_fastq/_2P.IG_Unmapped.fastq.gz --CPU 1 --no_normalize_reads --output IG-mapped_Unmapped/Trinity/.IG-mapped_Unmapped_trinity &> IG-mapped_Unmapped/Trinity/Log/.IG-mapped_Unmapped_trinity_log 2018-12-17 15:24:13 Finished Trinity assembly. Assembled reads written in IG-mapped_Unmapped/Trinity. igblastn -germline_db_V /work/ibassano/packages/BALDR-master/BALDR/resources/IgBLAST_DB/human/human_IG_V -germline_db_J /work/ibassano/packages/BALDR-master/BALDR/resources/IgBLAST_DB/human/human_IG_J -germline_db_D /work/ibassano/packages/BALDR-master/BALDR/resources/IgBLAST_DB/human/human_IG_D -organism human -domain_system imgt -query IG-mapped_Unmapped/Trinity/.IG-mapped_Unmapped_trinity.Trinity.fasta -auxiliary_data /work/ibassano/packages/BALDR-master/BALDR/resources/IgBLAST_DB/optional_file/human_gl.aux -show_translation -out IG-mapped_Unmapped/IgBLAST/.IG-mapped_Unmapped.blastout -num_threads 1 -db /work/ibassano/packages/BALDR-master/BALDR/resources/IgBLAST_DB/human/human_IG_C -evalue 0.001 -outfmt 7 -max_target_seqs 5 -max_hsps 1 &> IG-mapped_Unmapped/IgBLAST/.IG-mapped_Unmapped.blastout.log bowtie2-build IG-mapped_Unmapped/Quantification/full/.IG-mapped_Unmapped.full.fa IG-mapped_Unmapped/Quantification/full/index/.IG-mapped_Unmapped Settings: Output files: "IG-mapped_Unmapped/Quantification/full/index/.IG-mapped_Unmapped..bt2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: IG-mapped_Unmapped/Quantification/full/.IG-mapped_Unmapped.full.fa Total time for call to driver() for forward index: 00:00:00 bowtie2 -p 1 --no-unal --no-hd --no-discordant --gbar 1000 --end-to-end -a -x IG-mapped_Unmapped/Quantification/full/index/.IG-mapped_Unmapped -1 IG-mapped_Unmapped/IG-mapped_Unmapped_fastq/_1P.IG_Unmapped.fastq.gz -2 IG-mapped_Unmapped/IG-mapped_Unmapped_fastq/_2P.IG_Unmapped.fastq.gz | cut -f 3 | sort | uniq -c | sort -nr 1> IG-mapped_Unmapped/Quantification/full/counts/.IG-mapped_Unmapped_full_bt2_numreadsmapped bowtie2-build IG-mapped_Unmapped/Quantification/VDJ/.IG-mapped_Unmapped.VDJ.fa IG-mapped_Unmapped/Quantification/VDJ/index/.IG-mapped_Unmapped Settings: Output files: "IG-mapped_Unmapped/Quantification/VDJ/index/.IG-mapped_Unmapped..bt2" Line rate: 6 (line is 64 bytes) Lines per side: 1 (side is 64 bytes) Offset rate: 4 (one in 16) FTable chars: 10 Strings: unpacked Max bucket size: default Max bucket size, sqrt multiplier: default Max bucket size, len divisor: 4 Difference-cover sample period: 1024 Endianness: little Actual local endianness: little Sanity checking: disabled Assertions: disabled Random seed: 0 Sizeofs: void:8, int:4, long:8, size_t:8 Input files DNA, FASTA: IG-mapped_Unmapped/Quantification/VDJ/.IG-mapped_Unmapped.VDJ.fa Total time for call to driver() for forward index: 00:00:00 bowtie2 -p 1 --no-unal --no-hd --no-discordant --gbar 1000 --end-to-end -a -x IG-mapped_Unmapped/Quantification/VDJ/index/.IG-mapped_Unmapped -1 IG-mapped_Unmapped/IG-mapped_Unmapped_fastq/_1P.IG_Unmapped.fastq.gz -2 IG-mapped_Unmapped/IG-mapped_Unmapped_fastq/_2P.IG_Unmapped.fastq.gz | cut -f 3 | sort | uniq -c | sort -nr 1> IG-mapped_Unmapped/Quantification/VDJ/counts/.IG-mapped_Unmapped_VDJ_bt2_numreadsmapped

============================================

    Job resource usage summary 

             Memory (GB)    NCPUs

Requested : 70 16 Used : 0 (peak) 0.56 (ave)

============================================

On 17 Dec 2018, at 14:57, Amit Upadhyay notifications@github.com wrote:

I think you have a space between your file names. --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz, ../IB-031218-1_S1_L001_R2_001.fastq.gz

Can you please try after removing the space: ./BALDR --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,../IB-031218-1_S1_L001_R2_001.fastq.gz

On Mon, Dec 17, 2018 at 8:37 AM ibseq notifications@github.com wrote:

Hi there I'm gettign this error when running baldr: ERROR: Missing input files. Please enter the compressed fastq.gz file/s. Please note the file names should end in ".fastq.gz". For paired end, the files must be separated only by a comma.

command line: ./BALDR --paired ../IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz, ../IB-031218-1_S1_L001_R2_001.fastq.gz

any advice? thanks ibseq

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQOmy94WtVZEGoY_4iv3YLZgjTcQ9Zbks5u554QgaJpZM4ZWXp- .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-447874095, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw2583OcUqao2JwRD53z9AtqpEUoNks5u57DngaJpZM4ZWXp-.

amit-upadhyay commented 5 years ago

The step where the genome index is generated failed. Can you please try creating a STAR index manually: https://github.com/BosingerLab/BALDR/issues/1#issuecomment-388075420

You can then pass the location in the --STAR flag.

ibseq commented 5 years ago

where exactly do i add it? i have run two types of command: this one , which i sent you the log.e and log.o /work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master/BALDR --paired /work/ibassano/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/IB-031218-1_S1_L001_R2_001.fastq.gz

and a simple one, not sure if there are differences: still running /work/ibassano/packages/BALDR-master/BALDR --paired /work/ibassano/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/IB-031218-1_S1_L001_R2_001.fastq.gz

On 17 Dec 2018, at 16:08, Amit Upadhyay notifications@github.com wrote:

Please add the STAR genome index location to the --STAR_index flag when you run the BALDR command.

On Mon, Dec 17, 2018 at 11:02 AM ibseq notifications@github.com wrote:

i recall same issue as last time.

yes i have it. what do i do?

On 17 Dec 2018, at 15:58, Amit Upadhyay notifications@github.com wrote:

The step where the genome index is generated failed. Can you please try creating a STAR index manually:

1 (comment) <

https://github.com/BosingerLab/BALDR/issues/1#issuecomment-388075420> You can then pass the location in the --STAR flag.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/BosingerLab/BALDR/issues/4#issuecomment-447896060>, or mute the thread < https://github.com/notifications/unsubscribe-auth/Af_hw6MQmVAp__VuKDVJrYVQOmIBJMVKks5u5783gaJpZM4ZWXp- .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-447897533, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQOm3hQQHZPSjRxtC0Mj_iGsSqA6YBGks5u58AegaJpZM4ZWXp- .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-447899485, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hwzayMibbc2J6JhjF5Pn-DS_sP3tbks5u58FkgaJpZM4ZWXp-.

amit-upadhyay commented 5 years ago

/work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master/BALDR --paired /work/ibassano/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity </path/to/Trinity/binary> --STAR </path/to/STAR/binary> --STAR_index </path/to/STAR/genome/index>

ibseq commented 5 years ago

Hi Amit thanks again

I have no idea where in the BALDR package are the STAR/binary and trinity. any help on this?

thanks irene

(and the simple common baldr read1 read2 doesn’t work)

On 17 Dec 2018, at 16:21, Amit Upadhyay notifications@github.com wrote:

/work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master/BALDR --paired /work/ibassano/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity </path/to/Trinity/binary> --STAR </path/to/STAR/binary> --STAR_index </path/to/STAR/genome/index>

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-447904444, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw2Sl5V8G_dNmbbPXDCY7GDi-SivZks5u58SFgaJpZM4ZWXp-.

amit-upadhyay commented 5 years ago

The STAR and Trinity binaries are not included in BALDR package. They have to be installed separately

https://github.com/alexdobin/STAR https://github.com/trinityrnaseq/trinityrnaseq/releases/tag/Trinity-v2.3.2

amit-upadhyay commented 5 years ago

One thing to note is that BALDR does not currently work with newer versions of Trinity. It requires Trinity v2.3.2. If you happen to have the same version, you can try the following:

which STAR

which Trinity

These should give you the path for the binaries that you can use for the flags.

If these tools are not available on your server, you can install them locally in one of your directories.

For STAR, you can download https://github.com/alexdobin/STAR/archive/2.5.2b.tar.gz). After extracting, the binary is available in STAR-2.5.2b/bin/Linux_x86_64/STAR.

For Trinity, download: https://github.com/trinityrnaseq/trinityrnaseq/archive/Trinity-v2.3.2.tar.gz

Follow the instructions on https://github.com/trinityrnaseq/trinityrnaseq/wiki/Installing-Trinity

amit-upadhyay commented 5 years ago

Hi Irene,

I think the STAR genome index has not been generated correctly. Can you please check if all these files are present in the STAR genome index directory:

$ ls -l total 34634416 -rwxrwxrwx. 1 aupadh4 yerkes 1200 May 9 13:54 chrLength.txt -rwxrwxrwx. 1 aupadh4 yerkes 3123 May 9 13:54 chrNameLength.txt -rwxrwxrwx. 1 aupadh4 yerkes 1923 May 9 13:54 chrName.txt -rwxrwxrwx. 1 aupadh4 yerkes 2129 May 9 13:54 chrStart.txt -rwxrwxrwx. 1 aupadh4 yerkes 41854837 May 9 13:54 exonGeTrInfo.tab -rwxrwxrwx. 1 aupadh4 yerkes 16985258 May 9 13:54 exonInfo.tab -rwxrwxrwx. 1 aupadh4 yerkes 928822 May 9 13:54 geneInfo.tab -rwxrwxrwx. 1 aupadh4 yerkes 3208868819 May 9 13:54 Genome -rwxrwxrwx. 1 aupadh4 yerkes 835 May 9 13:54 genomeParameters.txt -rwxrwxrwx. 1 aupadh4 yerkes 0 May 9 13:48 README -rwxrwxrwx. 1 aupadh4 yerkes 24881828956 May 9 13:59 SA -rwxrwxrwx. 1 aupadh4 yerkes 1565873619 May 9 13:59 SAindex -rwxrwxrwx. 1 aupadh4 yerkes 10235563 May 9 13:54 sjdbInfo.txt -rwxrwxrwx. 1 aupadh4 yerkes 8020752 May 9 13:54 sjdbList.fromGTF.out.tab -rwxrwxrwx. 1 aupadh4 yerkes 8019182 May 9 13:54 sjdbList.out.tab -rwxrwxrwx. 1 aupadh4 yerkes 11688566 May 9 13:54 transcriptInfo.tab

ibseq commented 5 years ago

Hi amit, I am not sure where this directory is.

In the downloaded BALDR package there is a folder called resources/STAR_genomes/human but that’s it and it doesn’t have all the items listed below.

if i do “which STAR” i know where STAR is but is not a directory

irene

On 18 Dec 2018, at 15:41, Amit Upadhyay notifications@github.com wrote:

Hi Irene,

I think the STAR genome index has not been generated correctly. Can you please check if all these files are present in the STAR genome index directory:

$ ls -l total 34634416 -rwxrwxrwx. 1 aupadh4 yerkes 1200 May 9 13:54 chrLength.txt -rwxrwxrwx. 1 aupadh4 yerkes 3123 May 9 13:54 chrNameLength.txt -rwxrwxrwx. 1 aupadh4 yerkes 1923 May 9 13:54 chrName.txt -rwxrwxrwx. 1 aupadh4 yerkes 2129 May 9 13:54 chrStart.txt -rwxrwxrwx. 1 aupadh4 yerkes 41854837 May 9 13:54 exonGeTrInfo.tab -rwxrwxrwx. 1 aupadh4 yerkes 16985258 May 9 13:54 exonInfo.tab -rwxrwxrwx. 1 aupadh4 yerkes 928822 May 9 13:54 geneInfo.tab -rwxrwxrwx. 1 aupadh4 yerkes 3208868819 May 9 13:54 Genome -rwxrwxrwx. 1 aupadh4 yerkes 835 May 9 13:54 genomeParameters.txt -rwxrwxrwx. 1 aupadh4 yerkes 0 May 9 13:48 README -rwxrwxrwx. 1 aupadh4 yerkes 24881828956 May 9 13:59 SA -rwxrwxrwx. 1 aupadh4 yerkes 1565873619 May 9 13:59 SAindex -rwxrwxrwx. 1 aupadh4 yerkes 10235563 May 9 13:54 sjdbInfo.txt -rwxrwxrwx. 1 aupadh4 yerkes 8020752 May 9 13:54 sjdbList.fromGTF.out.tab -rwxrwxrwx. 1 aupadh4 yerkes 8019182 May 9 13:54 sjdbList.out.tab -rwxrwxrwx. 1 aupadh4 yerkes 11688566 May 9 13:54 transcriptInfo.tab

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448264374, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw73sZVLKt5g-wH80AhKB8JcqnVNWks5u6Qy5gaJpZM4ZWXp-.

amit-upadhyay commented 5 years ago

Before running BALDR, please do the following: mkdir -p /path/to/BALDR/resources/STAR_genomes/human/STAR_GRCh38_index

cd /path/to/BALDR/resources/STAR_genomes/human/

wget ftp://ftp.ensembl.org/pub/release-86/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

wget ftp://ftp.ensembl.org/pub/release-86/gtf/homo_sapiens/Homo_sapiens.GRCh38.86.gtf.gz

gunzip *.gz

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir STAR_GRCh38_index --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile Homo_sapiens.GRCh38.86.gtf --sjdbOverhang 100

Once, the genome index is generated successfully, you can add the following flag when running BALDR

--STAR_index /path/to/BALDR/resources/STAR_genomes/human/STAR_GRCh38_index

ibseq commented 5 years ago

Hi Amit I was looking at all the files in BALDR-master/resources/STAR_genomes/human here one folder is called GRCh38_primary_ensembl86 and it has all the files listed below. will this make a difference?

I am running the last command from your last email as IT replied to me (STAR --runThreadN 16 --runMode genomeGenerate --genomeDir STAR_GRCh38_index --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile Homo_sapiens.GRCh38.86.gtf --sjdbOverhang 100)

it takes a while…

On 18 Dec 2018, at 15:41, Amit Upadhyay notifications@github.com wrote:

Hi Irene,

I think the STAR genome index has not been generated correctly. Can you please check if all these files are present in the STAR genome index directory:

$ ls -l total 34634416 -rwxrwxrwx. 1 aupadh4 yerkes 1200 May 9 13:54 chrLength.txt -rwxrwxrwx. 1 aupadh4 yerkes 3123 May 9 13:54 chrNameLength.txt -rwxrwxrwx. 1 aupadh4 yerkes 1923 May 9 13:54 chrName.txt -rwxrwxrwx. 1 aupadh4 yerkes 2129 May 9 13:54 chrStart.txt -rwxrwxrwx. 1 aupadh4 yerkes 41854837 May 9 13:54 exonGeTrInfo.tab -rwxrwxrwx. 1 aupadh4 yerkes 16985258 May 9 13:54 exonInfo.tab -rwxrwxrwx. 1 aupadh4 yerkes 928822 May 9 13:54 geneInfo.tab -rwxrwxrwx. 1 aupadh4 yerkes 3208868819 May 9 13:54 Genome -rwxrwxrwx. 1 aupadh4 yerkes 835 May 9 13:54 genomeParameters.txt -rwxrwxrwx. 1 aupadh4 yerkes 0 May 9 13:48 README -rwxrwxrwx. 1 aupadh4 yerkes 24881828956 May 9 13:59 SA -rwxrwxrwx. 1 aupadh4 yerkes 1565873619 May 9 13:59 SAindex -rwxrwxrwx. 1 aupadh4 yerkes 10235563 May 9 13:54 sjdbInfo.txt -rwxrwxrwx. 1 aupadh4 yerkes 8020752 May 9 13:54 sjdbList.fromGTF.out.tab -rwxrwxrwx. 1 aupadh4 yerkes 8019182 May 9 13:54 sjdbList.out.tab -rwxrwxrwx. 1 aupadh4 yerkes 11688566 May 9 13:54 transcriptInfo.tab

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448264374, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw73sZVLKt5g-wH80AhKB8JcqnVNWks5u6Qy5gaJpZM4ZWXp-.

amit-upadhyay commented 5 years ago

It can take some time for STAR to generate the index.

If you already have the files in: BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86, you can add the following flag to the BALDR command line

--STAR_index BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86

ibseq commented 5 years ago

cool, running this now:

/work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master/BALDR --paired /work/ibassano/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity /home/ibassano/anaconda3/envs/BALDR/bin/Trinity --STAR /home/ibassano/anaconda3/envs/BALDR/bin --STAR_index BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86

On 19 Dec 2018, at 15:30, Amit Upadhyay notifications@github.com wrote:

It can take some time for STAR to generate the index.

If you already have the files in: BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86, you can add the following to flag to the BALDR command line

--STAR_index BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448635920, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw7uUrzbW_Zqv_k5i4EDxjk7AHgmIks5u6luEgaJpZM4ZWXp-.

amit-upadhyay commented 5 years ago

You have to make following two changes: --BALDR: has to be the path of BALDR-master and not the BALDR executable -- STAR: specify the STAR binary file

/work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master --paired /work/ibassano/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity /home/ibassano/anaconda3/envs/BALDR/bin/Trinity --STAR /home/ibassano/anaconda3/envs/BALDR/bin/STAR --STAR_index BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86

ibseq commented 5 years ago

this one has ran successfully : useful?

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir STAR_GRCh38_index --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile Homo_sapiens.GRCh38.86.gtf --sjdbOverhang 100

On 19 Dec 2018, at 16:09, Amit Upadhyay notifications@github.com wrote:

You have to make following two changes: --BALDR: has to be the path of BALDR-master -- STAR: specify the STAR binary file

/work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master --paired /work/ibassano/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity /home/ibassano/anaconda3/envs/BALDR/bin/Trinity --STAR /home/ibassano/anaconda3/envs/BALDR/bin/STAR --STAR_index BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448650319, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw6deR31gotBQFYcUcP7LB0twMSZsks5u6mSngaJpZM4ZWXp-.

amit-upadhyay commented 5 years ago

Sure. You can try using the new index.

On Wed, Dec 19, 2018, 11:26 AM ibseq <notifications@github.com wrote:

this one has ran successfully : useful?

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir STAR_GRCh38_index --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile Homo_sapiens.GRCh38.86.gtf --sjdbOverhang 100

On 19 Dec 2018, at 16:09, Amit Upadhyay notifications@github.com wrote:

You have to make following two changes: --BALDR: has to be the path of BALDR-master -- STAR: specify the STAR binary file

/work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master --paired /work/ibassano/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity /home/ibassano/anaconda3/envs/BALDR/bin/Trinity --STAR /home/ibassano/anaconda3/envs/BALDR/bin/STAR --STAR_index BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448650319>, or mute the thread < https://github.com/notifications/unsubscribe-auth/Af_hw6deR31gotBQFYcUcP7LB0twMSZsks5u6mSngaJpZM4ZWXp- .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448656757, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQOmyX-jPZPb89XnIGhNXjuEWdMhc41ks5u6miwgaJpZM4ZWXp- .

ibseq commented 5 years ago

going crazy: i run : /work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master --paired /work/ibassano/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity /home/ibassano/anaconda3/envs/BALDR/bin/Trinity --STAR /home/ibassano/anaconda3/envs/BALDR/bin/STAR --STAR_index /work/ibassano/packages/BALDR-master/resources/STAR_genomes/human/STAR_GRCh38_index

whats wrong now? trinity is correct right?

stderr: Unknown option: trinity Usage: Single-end: ./BALDR --single

Paired-end: ./BALDR --paired <R1.fastq.gz,R2.fastq.gz> <options>

Options:

  --method       One or more reconstruction methods. For multiple methods, separte only by comma
                 human: IG-mapped_Unmapped (default), Unfiltered, IG-mapped_only, IMGT-mapped, Recombinome-mapped 
                 rhesus_monkey: FilterNonIG (default), Unfiltered, IG-mapped_only, IG-mapped_Unmapped
  --organism     human (default) or rhesus_monkey
  --trimmomatic  Path for trimmomatic.jar file (e.g. ~/Trimmomatic-0.36/trimmomatic-0.36.jar) (required)
  --adapter      Path for the Trimmomatic adapter file (e.g. ~/Trimmomatic-0.36/adapters/NexteraPE-PE.fa) (required)
  --STAR_index   Path for the STAR aligner genome index
  --BALDR        Path for the BALDR directory (e.g. ~/BALDR) (required)
  --memory       Max memory for Trinity (default 32G)
  --threads      number of threads for STAR/bowtie2/Trinity (default 1)
  --version      Version
  --help         Print this help

On 19 Dec 2018, at 16:32, Amit Upadhyay notifications@github.com wrote:

Sure. You can try using the new index.

On Wed, Dec 19, 2018, 11:26 AM ibseq <notifications@github.com wrote:

this one has ran successfully : useful?

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir STAR_GRCh38_index --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile Homo_sapiens.GRCh38.86.gtf --sjdbOverhang 100

On 19 Dec 2018, at 16:09, Amit Upadhyay notifications@github.com wrote:

You have to make following two changes: --BALDR: has to be the path of BALDR-master -- STAR: specify the STAR binary file

/work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master --paired /work/ibassano/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity /home/ibassano/anaconda3/envs/BALDR/bin/Trinity --STAR /home/ibassano/anaconda3/envs/BALDR/bin/STAR --STAR_index BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448650319>, or mute the thread < https://github.com/notifications/unsubscribe-auth/Af_hw6deR31gotBQFYcUcP7LB0twMSZsks5u6mSngaJpZM4ZWXp- .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448656757, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQOmyX-jPZPb89XnIGhNXjuEWdMhc41ks5u6miwgaJpZM4ZWXp- .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448658912, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw8Y9IjEuFfKLHdzcpkAhTKsLe5_1ks5u6moRgaJpZM4ZWXp-.

ibseq commented 5 years ago

hi amit, i removed the --trinity line from the command line and got this:

TrimmomaticPE: Started with arguments: -threads 1 -trimlog Trimmed/Log/_trim.log /work/ibassano/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz /work/ibassano/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz Trimmed/_nexteratrim_1P.fastq.gz Trimmed/_nexteratrim_1U.fastq.gz Trimmed/_nexteratrim_2P.fastq.gz Trimmed/_nexteratrim_2U.fastq.gz ILLUMINACLIP:/home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa:2:30:10 Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA' ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 83857 Both Surviving: 83857 (100.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%) TrimmomaticPE: Completed successfully [bam_header_read] EOF marker is absent. The input is probably truncated. No such file or directory at /work/ibassano/packages/BALDR-master/lib/parse_igblast.pl line 45. Error: could not open IG-mapped_Unmapped/Quantification/full/.IG-mapped_Unmapped.full.fa Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build --wrapper basic-0 IG-mapped_Unmapped/Quantification/full/.IG-mapped_Unmapped.full.fa IG-mapped_Unmapped/Quantification/full/index/.IG-mapped_Unmapped (ERR): mkfifo(/tmp/84294.inpipe1) failed. Exiting now ... Error: could not open IG-mapped_Unmapped/Quantification/VDJ/.IG-mapped_Unmapped.VDJ.fa Error: Encountered internal Bowtie 2 exception (#1) Command: bowtie2-build --wrapper basic-0 IG-mapped_Unmapped/Quantification/VDJ/.IG-mapped_Unmapped.VDJ.fa IG-mapped_Unmapped/Quantification/VDJ/index/.IG-mapped_Unmapped (ERR): mkfifo(/tmp/84302.inpipe1) failed. Exiting now ... IG-mapped_Unmapped/IgBLAST/tabular/.IG-mapped_Unmapped.igblast_tabular not found

On 20 Dec 2018, at 09:24, Irene Bassano ibseq12@gmail.com wrote:

going crazy: i run : /work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master --paired /work/ibassano/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity /home/ibassano/anaconda3/envs/BALDR/bin/Trinity --STAR /home/ibassano/anaconda3/envs/BALDR/bin/STAR --STAR_index /work/ibassano/packages/BALDR-master/resources/STAR_genomes/human/STAR_GRCh38_index

whats wrong now? trinity is correct right?

stderr: Unknown option: trinity Usage: Single-end: ./BALDR --single

Paired-end: ./BALDR --paired <R1.fastq.gz,R2.fastq.gz> <options>

Options:

  --method       One or more reconstruction methods. For multiple methods, separte only by comma
                 human: IG-mapped_Unmapped (default), Unfiltered, IG-mapped_only, IMGT-mapped, Recombinome-mapped 
                 rhesus_monkey: FilterNonIG (default), Unfiltered, IG-mapped_only, IG-mapped_Unmapped
  --organism     human (default) or rhesus_monkey
  --trimmomatic  Path for trimmomatic.jar file (e.g. ~/Trimmomatic-0.36/trimmomatic-0.36.jar) (required)
  --adapter      Path for the Trimmomatic adapter file (e.g. ~/Trimmomatic-0.36/adapters/NexteraPE-PE.fa) (required)
  --STAR_index   Path for the STAR aligner genome index
  --BALDR        Path for the BALDR directory (e.g. ~/BALDR) (required)
  --memory       Max memory for Trinity (default 32G)
  --threads      number of threads for STAR/bowtie2/Trinity (default 1)
  --version      Version
  --help         Print this help

On 19 Dec 2018, at 16:32, Amit Upadhyay <notifications@github.com mailto:notifications@github.com> wrote:

Sure. You can try using the new index.

On Wed, Dec 19, 2018, 11:26 AM ibseq <notifications@github.com mailto:notifications@github.com wrote:

this one has ran successfully : useful?

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir STAR_GRCh38_index --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile Homo_sapiens.GRCh38.86.gtf --sjdbOverhang 100

On 19 Dec 2018, at 16:09, Amit Upadhyay <notifications@github.com mailto:notifications@github.com> wrote:

You have to make following two changes: --BALDR: has to be the path of BALDR-master -- STAR: specify the STAR binary file

/work/ibassano/packages/BALDR-master/BALDR --adapter /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/adapters/NexteraPE-PE.fa --trimmomatic /home/ibassano/anaconda3/envs/BALDR/share/trimmomatic-0.33-0/trimmomatic.jar --BALDR /work/ibassano/packages/BALDR-master --paired /work/ibassano/raw_data/IB-031218-1_S1_L001_R1_001.TRIM.fastq.gz,/work/ibassano/raw_data/IB-031218-1_S1_L001_R2_001.fastq.gz --trinity /home/ibassano/anaconda3/envs/BALDR/bin/Trinity --STAR /home/ibassano/anaconda3/envs/BALDR/bin/STAR --STAR_index BALDR-master/resources/STAR_genomes/human/GRCh38_primary_ensembl86

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448650319 https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448650319>, or mute the thread < https://github.com/notifications/unsubscribe-auth/Af_hw6deR31gotBQFYcUcP7LB0twMSZsks5u6mSngaJpZM4ZWXp- https://github.com/notifications/unsubscribe-auth/Af_hw6deR31gotBQFYcUcP7LB0twMSZsks5u6mSngaJpZM4ZWXp- .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448656757 https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448656757>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADQOmyX-jPZPb89XnIGhNXjuEWdMhc41ks5u6miwgaJpZM4ZWXp- https://github.com/notifications/unsubscribe-auth/ADQOmyX-jPZPb89XnIGhNXjuEWdMhc41ks5u6miwgaJpZM4ZWXp-> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448658912, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw8Y9IjEuFfKLHdzcpkAhTKsLe5_1ks5u6moRgaJpZM4ZWXp-.

amit-upadhyay commented 5 years ago

Hi Irene,

Regarding Trinity, did you pull the latest version? I added --trinity flag recently so that someone who has multiple versions of Trinity can specify the 2.3.2 version specifically. If you did not pull the latest version, BALDR would require that the 2.3.2 version is the default and you can skip the --trinity flag.

As for the other error, do you have seqtk and samtools installed? If yes, please remove the directories STAR, IG-mapped_Unmapped and try again.

amit-upadhyay commented 5 years ago

If you have docker installed, it would be easier to use the docker image. The instructions regarding that are in the README file.

ibseq commented 5 years ago

hi Amit, yes trinity is the correct version, it was fine until today

and yes seqtk and samtools are installed

"please remove the directories STAR, IG-mapped_Unmapped and try again.”: remove from where?

thanks irene

On 20 Dec 2018, at 13:04, Amit Upadhyay notifications@github.com wrote:

Hi Irene,

Regarding Trinity, did you pull the latest version? I added --trinity flag recently so that someone who has multiple versions of Trinity can specify the 2.3.2 version specifically. If you did not pull the latest version, BALDR would require that the 2.3.2 version is the default and you can skip the --trinity flag.

As for the other error, do you have seqtk and samtools installed? If yes, please remove the directories STAR, IG-mapped_Unmapped and try again.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-448992735, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw5w7v_VlqSFW7SE2jA5Na3ZQKLaoks5u64rKgaJpZM4ZWXp-.

ibseq commented 5 years ago

thanks amit, pls let me know the exact command because then if it doesn’t work i know it’s something to do with the installation and missing bits

On 20 Dec 2018, at 1:39 pm, Amit Upadhyay notifications@github.com wrote:

All right. I see a trim R1 and R2 file. I will run BALDR and get back to you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

amit-upadhyay commented 5 years ago

Hi Irene,

Unfortunately, BALDR does not retrieve the chains in your sample. I ran the latest version. Trinity was able to assemble only three sequences from this dataset, one of which was a partial light chain which eventually gets filtered out. What kind of library preparation was used for this single cell sample? Is this amplicon sequencing by any chance?

The command to use is:

./BALDR --paired Example/IB-031218-1_S1_L001_R1_001.fastq.gz,Example/IB-031218-1_S1_L001_R2_001.fastq.gz --trimmomatic /home/tools/Trimmomatic-0.36/trimmomatic-0.36.jar --adapter /home/tools/Trimmomatic-0.36/adapters/NexteraPE-PE.fa --BALDR /data/BALDR --trinity /home/tools/trinityrnaseq-Trinity-v2.3.2/Trinity --memory 48G --threads 16 --STAR /home/tools/STAR-2.5.2b/bin/Linux_x86_64/STAR --STAR_index /data/GenomeDir/Human/ensembl/GRCh38/star_pri --igblastn /home/tools/ncbi-igblast-1.6.1/bin/igblastn

Trinity assembly: more Trinity/IB-031218-1_S1_L001_R1_001.IG-mapped_Unmapped_trinity.Trinity.fasta

TRINITY_DN20_c3_g1_i1 len=284 path=[1:0-104 83:105-283] [-1, 1, 83, -2] GCGTTATCCACCTTCCACTGCTCAGGCGTCAGGCTCAGGTAGCTGCTGGCCGCGTACTTG TTGTTGCTTTGTTTGGAGGCTGTGGTGGTCTCCACTCCCGCCTTGACGGGGCTGCTATCT GCCTTCCAGGCCACTGTCACGGCTCCCGGGTAGAAGTCACTTATGAGACACACCAGTGTG GCCTTGTTGGCTTGAAGATCCAGAGAGGAGGGCGAGAACAAAGTGACCGAGGGGGCAGCC GTAGGCTGACCTAAGACAGTCAGCTTGGTCCCACCGCCGAATAT TRINITY_DN20_c3_g2_i1 len=226 path=[571:0-46 83:47-225] [-1, 571, 83, -2] TGTTGTTGCTTTGTTTGGAGGCTTTGGTGGTCTCCACTCCCGCCTTGACGGGGCTGCTAT CTGCCTTCCAGGCCACTGTCACGGCTCCCGGGTAGAAGTCACTTATGAGACACACCAGTG TGGCCTTGTTGGCTTGAAGATCCAGAGAGGAGGGCGAGAACAAAGTGACCGAGGGGGCAG CCGTAGGCTGACCTAAGACAGTCAGCTTGGTCCCACCGCCGAATAT TRINITY_DN21_c0_g1_i1 len=225 path=[203:0-224] [-1, 203, -2] ACGCTGAGGGTGGCTGAGCCAACACAGAATGGGCCCAGGACCCTGCACAGTGAGTGAGGA GGGTGAGGAGGAGAGGGAAGCTGGCCATGCTGGAGATTGTCCTGAGTCCTGTCTTCTCTA CCCACAGCTGAGGCTCCCATGTACTCTGCGTTGATACCACTGCTTAGATCGGAAGAGCGT CGTGTAGGGAAAGAATAGTGAGGTGCAGACGTTGGGCCTCGCTGG

IgBLAST_quant_sorted/IB-031218-1_S1_L001_R1_001.IG-mapped_Unmapped.igblast_tabular.quant.sorted.IGKL

TRINITY_DN21_c0_g1_i1 - 195- 225 43452 31 - IGLV1-4401 90.323 31 3 0 0 195 225 1 31 1.52e-05 40.8 - - - - - - - - - - - - - - - - - -- - - - - - IGLV1-4401,IGLV1-4701,IGLV1-4702 N/A VL No N/A N/A - AGCGT N/A N/A - - - - ACGCTGAGGGTGGCTGAGCCAACACAGAATGGGCCCAGGACCCTGCACAGTGAGTGAGGAGGGTGAGGAGGAGAGGGAAGCTGGCCATGCTGGAGATTGTCCTGAGTCCTGTCTTCT CTACCCACAGCTGAGGCTCCCATGTACTCTGCGTTGATACCACTGCTTAGATCGGAAGAGCGTCGTGTAGGGAAAGAATAGTGAGGTGCAGACGTTGGGCCTCGCTGG CCAGCGAGGCCCAACGTCTGCACCTCACTATTCTTTCCCTACACGACGCTCTTCCGATCTAAGCAGTGGTATCAACGCAGAGTACATGGGAGCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTCACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTT GGCTCAGCCACCCTCAGCGT CATTCTGTGTTGGCTCAGCCACCCTCAGCGT

ibseq commented 5 years ago

HI Amit apologies for the late reply. so, the data is good and working as I analysed it using MIxcr but this tool somehow did not detect the CDR3 regions (but it did detect all the others)

it was a nextera library

any advice? i wanted to compare at least three tools to make sure results were consistent

irene

On 20 Dec 2018, at 16:39, Amit Upadhyay notifications@github.com wrote:

Hi Irene,

Unfortunately, BALDR does not retrieve the chains in your sample. I ran the latest version. Trinity was able to assemble only three sequences from this dataset, one of which was a partial light chain which get eventually filtered out. What kind of library preparation was used for this single cell sample? Is this amplicon sequencing by any chance?

The command to use is:

./BALDR --paired Example/IB-031218-1_S1_L001_R1_001.fastq.gz,Example/IB-031218-1_S1_L001_R2_001.fastq.gz --trimmomatic /home/tools/Trimmomatic-0.36/trimmomatic-0.36.jar --adapter /home/tools/Trimmomatic-0.36/adapters/NexteraPE-PE.fa --BALDR /data/BALDR --trinity /home/tools/trinityrnaseq-Trinity-v2.3.2/Trinity --memory 48G --threads 16 --STAR /home/tools/STAR-2.5.2b/bin/Linux_x86_64/STAR --STAR_index /data/GenomeDir/Human/ensembl/GRCh38/star_pri --igblastn /home/tools/ncbi-igblast-1.6.1/bin/igblastn

Trinity assembly: more Trinity/IB-031218-1_S1_L001_R1_001.IG-mapped_Unmapped_trinity.Trinity.fasta

TRINITY_DN20_c3_g1_i1 len=284 path=[1:0-104 83:105-283] [-1, 1, 83, -2] GCGTTATCCACCTTCCACTGCTCAGGCGTCAGGCTCAGGTAGCTGCTGGCCGCGTACTTG TTGTTGCTTTGTTTGGAGGCTGTGGTGGTCTCCACTCCCGCCTTGACGGGGCTGCTATCT GCCTTCCAGGCCACTGTCACGGCTCCCGGGTAGAAGTCACTTATGAGACACACCAGTGTG GCCTTGTTGGCTTGAAGATCCAGAGAGGAGGGCGAGAACAAAGTGACCGAGGGGGCAGCC GTAGGCTGACCTAAGACAGTCAGCTTGGTCCCACCGCCGAATAT TRINITY_DN20_c3_g2_i1 len=226 path=[571:0-46 83:47-225] [-1, 571, 83, -2] TGTTGTTGCTTTGTTTGGAGGCTTTGGTGGTCTCCACTCCCGCCTTGACGGGGCTGCTAT CTGCCTTCCAGGCCACTGTCACGGCTCCCGGGTAGAAGTCACTTATGAGACACACCAGTG TGGCCTTGTTGGCTTGAAGATCCAGAGAGGAGGGCGAGAACAAAGTGACCGAGGGGGCAG CCGTAGGCTGACCTAAGACAGTCAGCTTGGTCCCACCGCCGAATAT TRINITY_DN21_c0_g1_i1 len=225 path=[203:0-224] [-1, 203, -2] ACGCTGAGGGTGGCTGAGCCAACACAGAATGGGCCCAGGACCCTGCACAGTGAGTGAGGA GGGTGAGGAGGAGAGGGAAGCTGGCCATGCTGGAGATTGTCCTGAGTCCTGTCTTCTCTA CCCACAGCTGAGGCTCCCATGTACTCTGCGTTGATACCACTGCTTAGATCGGAAGAGCGT CGTGTAGGGAAAGAATAGTGAGGTGCAGACGTTGGGCCTCGCTGG

IgBLAST_quant_sorted/IB-031218-1_S1_L001_R1_001.IG-mapped_Unmapped.igblast_tabular.quant.sorted.IGKL

TRINITY_DN21_c0_g1_i1 - 195- 225 43452 31 - IGLV1-4401 90.323 31 3 0 0 195 225 1 31 1.52e-05 40.8 - - - - - - - - - - - - - - - - - -- - - - - - IGLV1-4401,IGLV1-4701,IGLV1-4702 N/A VL No N/A N/A - AGCGT N/A N/A - - - - ACGCTGAGGGTGGCTGAGCCAACACAGAATGGGCCCAGGACCCTGCACAGTGAGTGAGGAGGGTGAGGAGGAGAGGGAAGCTGGCCATGCTGGAGATTGTCCTGAGTCCTGTCTTCT CTACCCACAGCTGAGGCTCCCATGTACTCTGCGTTGATACCACTGCTTAGATCGGAAGAGCGTCGTGTAGGGAAAGAATAGTGAGGTGCAGACGTTGGGCCTCGCTGG CCAGCGAGGCCCAACGTCTGCACCTCACTATTCTTTCCCTACACGACGCTCTTCCGATCTAAGCAGTGGTATCAACGCAGAGTACATGGGAGCCTCAGCTGTGGGTAGAGAAGACAGGACTCAGGACAATCTCCAGCATGGCCAGCTTCCCTCTCCTCCTCACCCTCCTCACTCACTGTGCAGGGTCCTGGGCCCATTCTGTGTT GGCTCAGCCACCCTCAGCGT CATTCTGTGTTGGCTCAGCCACCCTCAGCGT

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BosingerLab/BALDR/issues/4#issuecomment-449059461, or mute the thread https://github.com/notifications/unsubscribe-auth/Af_hw2c1Sl0gfxCrap-6tZijIlWX00Hpks5u670rgaJpZM4ZWXp-.

amit-upadhyay commented 5 years ago

Hi Irene,

This does not appear to be conventional mRNA RNA-Seq data but looks like Ig repertoire data. I looked at the STAR ReadsPerGene file and there is no mapping to other genes. BALDR was not designed for these libraries.

Some tools that were designed for this type of data include immcantation pipeline, IMSEQ, IgReC, etc. Also, a quick way to look at the sequences would be to merge the paired-end reads with tools like FLASH/PANDAseq and then use igblast/migmap to annotate the sequences.

Amit.