Closed talhamufeed closed 3 months ago
Also I wanted to ask if AA can run on WXS data. I tried to run it once and it kept on working for 2 days until I had to force stop it. So is that WXS analysis do-able with AA and if yes, what are some steps that are to be done for a correct analysis?
Hi Talha,
Thanks for reaching out with this query. I believe you are running into a truncated bam file. If you are starting from FASTQ files then the issue might be that the job timed out and a partial bam file was passed to the rest of the task. Also important to note that the GenePattern server is not optimized for bam files that are large and if it did not upload completely you would run into this issue. Can you please let me know the job ID of your run on the GenePattern server? @edwin5588 @liefeld can help debug this issue on the server by looking at the logs from your job.
As described in our FAQ, AA will absolutely not function with WXS data.
Thanks @jluebeck for the timely response. I think there is an issue with the FASTQ files. My genepattern run IDs are 598981, 598980, 598979, 598978. If you can look into these. Most of these are failing again and again. Also there is another issue which i just noted, one FASTQ file from SRA, that is labelled to be of a female patient, AA has detected an amplicon on Chr Y, is it due to an alignment issue? Best, Talha Mufeed
Thank you, I will put you in touch with the GenePattern team to debug this issue.
The chrY amplicon does sound like an artifact. If the fastq files are corrupted or truncated then there can certainly be incorrect results. We will take a look and follow-up with you via email.
Hi,
I have done a bit of digging around and I am pretty certain that the issue is in the retrieval of the fastq files from EBI. I have been unable to download them locally consistently and on multiple occasions have failed even to get a directory listing in their fastq directory on both the ftp server and through globus. The best approach I think will be for you to either download them locally then upload them to GenePattern, or to use globus to try to transfer them. In either case make sure that the size/signature of the files is correct and that you have obtained the entire file.
Hope this helps, Ted
Hey Ted, That makes sense. I'll try using globus for the transfer. Thanks for the help. Regards, Talha
Hello there, I have used Amplicon suite on one cohort and it detected eccDNA in 6 out of 23 samples. Next I wanted to try it on another cohort with tumor-normal paired fastq files, but I have not yet detected any eccDNA and most of the times the analysis fails when I run it on genepattern server. What can be the issue? I'll attach a few error messages I have gotten so far with the second cohort. .............................................. [root:ERROR] The total length of focal amp seed regions was in excess of maximum default length allowed by AA. Please ensure that the file specified by --bed is a valid AA_CNV_SEEDS.bed file, NOT whole genome CNV calls. For more information on producing seed regions for AA, please see AmpliconSuite-pipeline. https://github.com/jluebeck/AmpliconSuite-pipeline
To bypass this error message (not recommended!), set a different value for --max_seedlen (default 500000000). ............................................... [root:INFO] /Results/xxxxxxx/SRR13215413.cs.rmdup.bam index not found, calling samtools index [W::bam_hdr_read] EOF marker is absent. The input is probably truncated [E::bgzfread] Read block operation failed with error 4 after 0 of 4 bytes samtools index: failed to create index for "/Results/xxxxx/SRR13215413.cs.rmdup.bam": No such file or directory [root:INFO] Finished indexing [W::bam_hdr_read] EOF marker is absent. The input is probably truncated [E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes [bam_flagstatcore] Truncated file? Continue anyway. [root:INFO] /Results/xxxxx/SRR13215413.cs.rmdup.bam: 1238482707 + 0 properly paired (97.39% : N/A)