AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
48 stars 25 forks source link

b37 reference genome #33

Closed yliu34 closed 1 year ago

yliu34 commented 1 year ago

Hello! I have been trying to run the pipeline these days but have not worked it out. I believe the problem was because of the reference genome since the error is "Could not match BAM to a known AA reference genome" even though it did not stop until "Running CNVKit segment". I am wondering if there is a way to use b37 reference genome. Thank you and I am looking forward to your response.

Best regards, Yang

yliu34 commented 1 year ago

I also tried transforming bams to fastqs and then using your pipeline with fastqs as input. It did not work either. The error message is as follows: 2023-02-16 21:44:29.484346 PrepareAA version 0.1344.3

Running PrepareAA on sample: test_02162 Running pipeline on /home/yliu34/bams_amplicon/IPCT-S4024-MOON0056-222867-WG01_200615-A00728-0199-AH7L2HDSXY-2-CACTTCGA.bwa_recalibed_r1.fastq /home/yliu34/bams_amplicon/IPCT-S4024-MOON0056-222867-WG01_200615-A00728-0199-AH7L2HDSXY-2-CACTTCGA.bwa_recalibed_r2.fastq /home/yliu34/test_02162 Checking for ref index

Performing alignment and sorting { bwa mem -K 10000000 -t 4 /home/yliu34/data_repo/hg19/hg19full.fa /home/yliu34/bams_amplicon/IPCT-S4024-MOON0056-222867-WG01_200615-A00728-0199-AH7L2HDSXY-2-CACTTCGA.bwa_recalibed_r1.fastq /home/yliu34/bams_amplicon/IPCT-S4024-MOON0056-222867-WG01_200615-A00728-0199-AH7L2HDSXY-2-CACTTCGA.bwa_recalibed_r2.fastq | samtools view -Shu - | samtools sort -m 4G -@4 -o /home/yliu34/test_02162.cs.bam -; } 2>/home/yliu34/test_02162_aln_stage.stderr

Performing duplicate removal & indexing samtools rmdup -s /home/yliu34/test_02162.cs.bam /home/yliu34/test_02162.cs.rmdup.bam

Running samtools index samtools index /home/yliu34/test_02162.cs.rmdup.bam Removing temp BAM /home/yliu34/test_02162.cs.rmdup.bam: 0 + 0 properly paired (N/A : N/A) Traceback (most recent call last): File "/home/programs/AmpliconSuite-pipeline-master/PrepareAA.py", line 764, in check_reference.check_properly_paired(args.sorted_bam) File "/home/programs/AmpliconSuite-pipeline-master/check_reference.py", line 79, in check_properly_paired ppp = float(t.rsplit("(")[-1].rsplit("%")[0]) ValueError: could not convert string to float: N/A : N/A)

jluebeck commented 1 year ago

Hi Yang,

Sorry to hear there have been issues. Can you please provide the header of the original bam file you attempted to use, as well as the header of the the test_02162.cs.rmdup.bam (samtools view -H file.bam)? You can email them to me at jluebeck [at ] eng.ucsd.edu. Additionally can you share what version of samtools you are using? Have you attempted using the dockerized version of AmpliconSuite-pipeline?

yliu34 commented 1 year ago

Hi Jens,

Thanks for the prompt response. Previously, I used samtools/1.9 to convert bams to fastqs. Today, I tried Picard for the conversion and it is running right now. Hopefully, it will be successful since it has been 2 hours.

Due to institute limitations, I used singularity container to run the docker codes. But it should work similarly to docker.

jluebeck commented 1 year ago

Hi Yang,

Thank you for sending these files. Can you please check that you have downloaded the GRCh37.tar.gz data repo from https://datasets.genepattern.org/?prefix=data/module_support_files/AmpliconArchitect/ and placed it in your $AA_DATA_REPO directory? Once in place you should be able to use your original bam file - which is GRCh37-aligned.

Note that the hg19 and GRCh37 data repos cannot be used interchangeably due to the differences in chromosome naming.

Regarding your second error - the N/A/ message using your re-aligned fastq files - this seems perhaps like an error with the unmapping process - reads must be sorted by name before unmapping and remapping. How large was the bam file that resulted from the remapping process? Here is a script for the unmapping process you can use if you decide to continue with that route.

Thanks, Jens

yliu34 commented 1 year ago

Hello Jens,

Thank you for your kind help. I used GRCh37 as the reference genome and it works perfectly. You may close the ticket.

Best, Yang