YosefLab / BRAPeS

BCR reconstruction from short single cell RNA-seq
Other
4 stars 1 forks source link

AttributeError #5

Open tAndreani opened 4 years ago

tAndreani commented 4 years ago

Hello,

I have downloaded and installed brapes and the test data from mouse provided within the tool works well.

I am trying now with my own human data. The fastq files were mapped with tophat (i tried also with star but I had the same problem). The fastq file has reads in 25 bp read length, but i have this problem:

command is:

python brapes.py -genome hg38 -path Example_our_data/proc_data/ -sumF Example_our_data/BRAPeS_out/BCR_test.out  -output BRAPeS_out/BCR_test.out -score 10 -top 5 -byExp -unmapped unmapped.bam -bam accepted_hits.bam

error is:

2020-04-28 15:42:31.107411 Working on: RL0654
2020-04-28 15:42:31.126610 Pre-processing heavy chain
Traceback (most recent call last):
  File "brapes.py", line 2157, in <module>
    args.Fr)
  File "brapes.py", line 52, in runBCRpipe
    hvr_path, genome)
  File "brapes.py", line 282, in runSingleCell
    unDictHeavy = analyzeChain(fastaDict, vdjDict, output, bam, unmapped, idNameDict, bases, 'H', strand, NolowQ, top, byExp, readOverlap, downsample, organism, Hminus, Kminus, Lminus)
  File "brapes.py", line 1440, in analyzeChain
    junctionSegs = makeJunctionFile(bam, chain, output, bases, vdjDict, fastaDict, idNameDict, top, byExp, readOverlap, organism)
  File "brapes.py", line 1473, in makeJunctionFile
    junctionSegs = writeJunctions(vjReads,outName, bases, fastaDict, idNameDict, cSeq, top, vjCounts, byExp)
  File "brapes.py", line 1999, in writeJunctions
    SeqIO.write(sorted_pairs[0][0],out,'fasta')
  File "/site/ne/app/x86_64/python/v2.7.13/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 529, in write
    fp.write(format_function(record))
  File "/site/ne/app/x86_64/python/v2.7.13/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.py", line 341, in as_fasta
    id = _clean(record.id)
AttributeError: 'str' object has no attribute 'id'

What could be the problem?

shakea02 commented 4 years ago

Dear Tommaso, I think I fixed the problem, can you pull the new version and try again? Also, I noticed that for your run you used "-score 10". For 25bp, based on my experience "-score 15" or "-score 21" work well so I recommend trying that out.

Best, Shaked

tAndreani commented 4 years ago

Thank you now it works. I have followed your suggestion to use -score 10 and I can get some results.. The time of processing an entire sample is quite long though. It took me up to 1 day and I am wondering if this is normal. It is a single cell from the BASIC paper dataset that you have also used in your manuscript. I am wondering if you also had this time of exectuion more or less to be sure..

Thank you in advance for any reply.

shakea02 commented 4 years ago

Hi, The BASIC samples were sequenced relatively deeply, so that's why it is taking longer compared to other samples, but still it shouldn't take a day. For such samples I recommend using the "-downsample" parameter, it will downsample the reads which will make it run faster.

Here are the parameters I used when running on the 25bp version of the BASIC cells: “-score 15 -top 6 -byExp -iterations 6 -downsample -oneSide"

Let me know if that helps! Best, Shaked

tAndreani commented 3 years ago

Dear Shaked,

sorry late reply. I was able to run Brapes using the BASIC dataset with at least 1mln reads using 50 bp as length of the reads. I have used to map the reads this parameters using STAR:

STAR --runThreadN 8 --genomeDir STAR_GRCh38_index --genomeLoad NoSharedMemory --readFilesCommand zcat --readFilesIn Sample1.R1.50bp.length.1mln.250K.cov.gz Sample1.R2.50bp.length.1mln.250K.cov.gz --outSAMunmapped Within --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 52844963932 --outFileNamePrefix file.$sample

I have used the mapped reads to run Brapes using the downsampled function:

python brapes.py -genome hg38_num -path proc_data/ -bam proc_data/Sample1/ -sumF Sample1/BRAPeS_out/BCR1.Sample1.out -output Sample1/BRAPeS_out/BCR1.Sample1.out -score 15 -top 6 -byExp -iterations 6 -downsample -oneSide -unmapped Sample1.bam -bam Sample1.bam

(mapped and unmapped reads were all in 1 file)

Using this setting I am not able to reconstruct the heavy chains. Is this ever happened to you? What parameters are used to map the reads with Tophat?

Thank you in advance for your feedback.

Tommaso