YosefLab / BRAPeS

BCR reconstruction from short single cell RNA-seq
AttributeError #5

Open tAndreani opened 4 years ago

tAndreani commented 4 years ago


I have downloaded and installed brapes and the test data from mouse provided within the tool works well.

I am trying now with my own human data. The fastq files were mapped with tophat (i tried also with star but I had the same problem). The fastq file has reads in 25 bp read length, but i have this problem:

command is:

python brapes.py -genome hg38 -path Example_our_data/proc_data/ -sumF Example_our_data/BRAPeS_out/BCR_test.out  -output BRAPeS_out/BCR_test.out -score 10 -top 5 -byExp -unmapped unmapped.bam -bam accepted_hits.bam

error is:

2020-04-28 15:42:31.107411 Working on: RL0654
2020-04-28 15:42:31.126610 Pre-processing heavy chain
Traceback (most recent call last):
  File "brapes.py", line 2157, in <module>
  File "brapes.py", line 52, in runBCRpipe
    hvr_path, genome)
  File "brapes.py", line 282, in runSingleCell
    unDictHeavy = analyzeChain(fastaDict, vdjDict, output, bam, unmapped, idNameDict, bases, 'H', strand, NolowQ, top, byExp, readOverlap, downsample, organism, Hminus, Kminus, Lminus)
  File "brapes.py", line 1440, in analyzeChain
    junctionSegs = makeJunctionFile(bam, chain, output, bases, vdjDict, fastaDict, idNameDict, top, byExp, readOverlap, organism)
  File "brapes.py", line 1473, in makeJunctionFile
    junctionSegs = writeJunctions(vjReads,outName, bases, fastaDict, idNameDict, cSeq, top, vjCounts, byExp)
  File "brapes.py", line 1999, in writeJunctions
  File "/site/ne/app/x86_64/python/v2.7.13/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 529, in write
  File "/site/ne/app/x86_64/python/v2.7.13/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 341, in as_fasta
    id = _clean(record.id)
AttributeError: 'str' object has no attribute 'id'

What could be the problem?

shakea02 commented 4 years ago

Dear Tommaso, I think I fixed the problem, can you pull the new version and try again? Also, I noticed that for your run you used "-score 10". For 25bp, based on my experience "-score 15" or "-score 21" work well so I recommend trying that out.

Best, Shaked

tAndreani commented 4 years ago

Thank you now it works. I have followed your suggestion to use -score 10 and I can get some results.. The time of processing an entire sample is quite long though. It took me up to 1 day and I am wondering if this is normal. It is a single cell from the BASIC paper dataset that you have also used in your manuscript. I am wondering if you also had this time of exectuion more or less to be sure..

Thank you in advance for any reply.

shakea02 commented 4 years ago

Hi, The BASIC samples were sequenced relatively deeply, so that's why it is taking longer compared to other samples, but still it shouldn't take a day. For such samples I recommend using the "-downsample" parameter, it will downsample the reads which will make it run faster.

Here are the parameters I used when running on the 25bp version of the BASIC cells: “-score 15 -top 6 -byExp -iterations 6 -downsample -oneSide"

Let me know if that helps! Best, Shaked

tAndreani commented 3 years ago

Dear Shaked,

sorry late reply. I was able to run Brapes using the BASIC dataset with at least 1mln reads using 50 bp as length of the reads. I have used to map the reads this parameters using STAR:

STAR --runThreadN 8 --genomeDir STAR_GRCh38_index --genomeLoad NoSharedMemory --readFilesCommand zcat --readFilesIn Sample1.R1.50bp.length.1mln.250K.cov.gz Sample1.R2.50bp.length.1mln.250K.cov.gz --outSAMunmapped Within --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 52844963932 --outFileNamePrefix file.$sample

I have used the mapped reads to run Brapes using the downsampled function:

python brapes.py -genome hg38_num -path proc_data/ -bam proc_data/Sample1/ -sumF Sample1/BRAPeS_out/BCR1.Sample1.out -output Sample1/BRAPeS_out/BCR1.Sample1.out -score 15 -top 6 -byExp -iterations 6 -downsample -oneSide -unmapped Sample1.bam -bam Sample1.bam

(mapped and unmapped reads were all in 1 file)

Using this setting I am not able to reconstruct the heavy chains. Is this ever happened to you? What parameters are used to map the reads with Tophat?

Thank you in advance for your feedback.
