cxzhu / Paired-seq

Paired-seq
14 stars 5 forks source link

how to use this tool #1

Closed fangling0913 closed 4 years ago

fangling0913 commented 4 years ago

I download the raw read and try to get the valid RNA data. The 'proc_RNA.sh' can't run correctly. How can I get RNA data with correct barcode with the rawreads (SRR8980195).

cxzhu commented 4 years ago

Can you upload the error messages?

I download the raw read and try to get the valid RNA data. The 'proc_RNA.sh' can't run correctly. How can I get RNA data with correct barcode with the rawreads (SRR8980195).

fangling0913 commented 4 years ago

@cxzhu proc_RNA.sh ''' s=$1

reacthools combine CZ${s}_R1.fq.gz CZ${s}_R2.fq.gz zcat CZ${s}_combined.fq.gz | bowtie /projects/ps-renlab/chz272/genome_ref/cell_id/cell_id -p 8 -v 3 --norc - -S CZ${s}.sam

reachtools convert CZ${s}.sam trim_galore -a AAAAAAAAAAAAAAAACCTGCAGGNNNNNNNNNN CZ${s}_cov.fq.gz trim_galore -a GGGGGGNNNNNNNNNNNNNNNN CZ${s}_cov_trimmed.fq.gz

STAR --runThreadN 6 --genomeDir /projects/ps-renlab/chz272/genome_ref/refdata-cellranger-mm10-3.0.0/star/ --readFilesIn CZ${s}_cov_trimmed_trimmed.fq.gz --readFilesCommand zcat --outFileNamePrefix CZ${s}_mm10 samtools sort CZ${s}_mm10Aligned.sam CZ${s}_mm10 reachtools rmdup CZ${s}_mm10.bam

reachtools bam2Mtx CZ${s}_mm10.bam mm10

'''

  1. The rawreads contain RNA and DNA reads but I can't split them in this pipline.
  2. The 'convert' tool always report the error: [1]+ Segmentation fault (core dumped) I noticed that there are similar tools: convertReads and convRead2. What's the difference?
cxzhu commented 4 years ago

@cxzhu proc_RNA.sh ''' s=$1

reacthools combine CZ${s}_R1.fq.gz CZ${s}_R2.fq.gz zcat CZ${s}_combined.fq.gz | bowtie /projects/ps-renlab/chz272/genome_ref/cell_id/cell_id -p 8 -v 3 --norc - -S CZ${s}.sam

reachtools convert CZ${s}.sam trim_galore -a AAAAAAAAAAAAAAAACCTGCAGGNNNNNNNNNN CZ${s}_cov.fq.gz trim_galore -a GGGGGGNNNNNNNNNNNNNNNN CZ${s}_cov_trimmed.fq.gz

STAR --runThreadN 6 --genomeDir /projects/ps-renlab/chz272/genome_ref/refdata-cellranger-mm10-3.0.0/star/ --readFilesIn CZ${s}_cov_trimmed_trimmed.fq.gz --readFilesCommand zcat --outFileNamePrefix CZ${s}_mm10 samtools sort CZ${s}_mm10Aligned.sam CZ${s}_mm10 reachtools rmdup CZ${s}_mm10.bam

reachtools bam2Mtx CZ${s}_mm10.bam mm10

'''

  1. The rawreads contain RNA and DNA reads but I can't split them in this pipline.
  2. The 'convert' tool always report the error: [1]+ Segmentation fault (core dumped) I noticed that there are similar tools: convertReads and convRead2. What's the difference?
  1. The raw reads only contain RNA reads. The DNA reads are separate files.
  2. Can you also upload the output log from "reachtools combine" and "bowtie". The convertReads and convRead2 are obsolete. Only "convert" function can work.
fangling0913 commented 4 years ago

@cxzhu To save time, I used the top 1 million reads to try. log from "reachtools combine" ''' 1000000 read pairs processed. 454235 read pairs passed docking rate. '''

log from "bowtie" ''' reads processed: 454235 reads with at least one reported alignment: 408336 (89.90%) reads that failed to align: 45899 (10.10%) Reported 408336 alignments '''

cxzhu commented 4 years ago

@cxzhu To save time, I used the top 1 million reads to try. log from "reachtools combine" ''' 1000000 read pairs processed. 454235 read pairs passed docking rate. '''

log from "bowtie" ''' reads processed: 454235 reads with at least one reported alignment: 408336 (89.90%) reads that failed to align: 45899 (10.10%) Reported 408336 alignments '''

As the C error message gives very little information. Please use the following perl script instead of this function: `

!/usr/bin/perl

use strict; use warnings;

open IN, "$ARGV[0]" or die $!; my $name = substr($ARGV[0],0,length($ARGV[0])-4); open OUT, "|gzip - > $name_cov.fq.gz";

while(\<IN>){ next if m/^\@/; my @tmp = split/\s+/, $_; next if $tmp[2] eq '*'; my $cell_id = $tmp[2]; my @sp = split/\:/, $tmp[0]; my $readname = "@".$sp[0].":".$sp[1].":".$sp[2].":".$sp[3].":".$sp[4].":".$sp[5].":".$sp[6].":".$cell_id.":".$sp[7]; my $read = $sp[8]; my $l = length($read); my $mark = "+"; my $qual = substr($tmp[0], -$l, $l); print OUT "$readname\n$read\n$mark\n$qual\n"; } close IN; close OUT; `

fangling0913 commented 4 years ago

@cxzhu This is the error message in the while loop of the perl script. I don't know what you want to write ' Use of uninitialized value $_ in pattern match (m//) at /data2/gminix/projectnew/fangling/software/Paired-seq/convert.pl line 10. Use of uninitialized value $ in split at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 11. Use of uninitialized value $tmp[2] in string eq at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 12. Use of uninitialized value in split at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 14. Use of uninitialized value $sp[0] in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value $cell_id in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value $l in negation (-) at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 19. Use of uninitialized value $l in substr at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 19. Use of uninitialized value in substr at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 19. Use of uninitialized value $read in concatenation (.) or string at /data2/gminix/projectnew/fangling/software/Paired-seq/convert.pl line 20. Use of uninitialized value $ in pattern match (m//) at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 10. '

cxzhu commented 4 years ago

@cxzhu This is the error message in the while loop of the perl script. I don't know what you want to write ' Use of uninitialized value $_ in pattern match (m//) at /data2/gminix/projectnew/fangling/software/Paired-seq/convert.pl line 10. Use of uninitialized value $ in split at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 11. Use of uninitialized value $tmp[2] in string eq at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 12. Use of uninitialized value in split at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 14. Use of uninitialized value $sp[0] in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value $cell_id in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value in concatenation (.) or string at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 15. Use of uninitialized value $l in negation (-) at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 19. Use of uninitialized value $l in substr at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 19. Use of uninitialized value in substr at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 19. Use of uninitialized value $read in concatenation (.) or string at /data2/gminix/projectnew/fangling/software/Paired-seq/convert.pl line 20. Use of uninitialized value $ in pattern match (m//) at /data2/gminix/project_new/fangling/software/Paired-seq/convert.pl line 10. '

Please refer to "https://github.com/cxzhu/Paired-seq/blob/master/convert.pl". I think github masked some symbols in this comment box.

fangling0913 commented 4 years ago

@cxzhu Sorry, It still doesn't work.

This is sam file: SRR8980195.1:TTTTGCTACT:GGGGTGAGTGGTGCCTGGAGGCTAACTCAAGAACACTGGCGTCTCCTTCATTG:DDCDDIIIIHHHHIIHHHHHHIIGIIIIIIHIIIIIIIHIIHIIGIIIIIGFH 0 19:03:55:05 1 255 24M * 0 0 ACGATTGAAACGTCCAAAATGAGG DDBDDIIGHIIIIG?EGEEHIIII XA:i:2 MD:Z:15C7A0 NM:i:2 XM:i:2

'my @sp = split/\:/, $tmp[0]; my $readname = "@".$sp[0].":".$sp[1].":".$sp[2].":".$sp[3].":".$sp[4].":".$sp[5].":".$sp[6].":".$cell_id.":".$sp[7]; my $read = $sp[8];'

$sp only has four items. how to write the sp4-8 Is there something wrong with my sam file?

cxzhu commented 4 years ago

@fangling0913

@cxzhu Sorry, It still doesn't work.

This is sam file: SRR8980195.1:TTTTGCTACT:GGGGTGAGTGGTGCCTGGAGGCTAACTCAAGAACACTGGCGTCTCCTTCATTG:DDCDDIIIIHHHHIIHHHHHHIIGIIIIIIHIIIIIIIHIIHIIGIIIIIGFH 0 19:03:55:05 1 255 24M * 0 0 ACGATTGAAACGTCCAAAATGAGG DDBDDIIGHIIIIG?EGEEHIIII XA:i:2 MD:Z:15C7A0 NM:i:2 XM:i:2

'my @sp = split/:/, $tmp[0]; my $readname = "@".$sp[0].":".$sp[1].":".$sp[2].":".$sp[3].":".$sp[4].":".$sp[5].":".$sp[6].":".$cell_id.":".$sp[7]; my $read = $sp[8];'

$sp only has four items. how to write the sp4-8 Is there something wrong with my sam file?

I finally got the point!

The format of standard sam from illumina fastq files: "7001113:1032:HF3CJBCX3:1:1107:8388:2231:TTATTCATAA:CGGACGGACGAATGCTCTGGCCTCTCAAGCACGTGGATTGCTACGGGTCTGAGTTCGCACCGAAACATCGGCCACGTACCTCCTTTATGAATAA:@D@DDHHCHCC?GECG@@HG1CHIFHIC<FCGHFHHEE<CGEHIHFDHDHHHIFGEGHHD<H/<CCEEEGDDHIHIHIHIHH11C@HIIFHEHH"

for the fastq dumped from SRA, the readname is shortened to only 1 word. So please modify the code to: ' my $readname = "@".$sp[0].":".$cell_id.":".$sp[1]; my $read = $sp[2]; '

fangling0913 commented 4 years ago

@cxzhu It works now. Thanks