Open XH-BIM opened 6 months ago
Phase I Build the analysis pipeline for Pacbio sequencing data analysis for WGS, RNA and DNA methylation with docker image and cluster computing.
2024.05.28 Try to run a hifisomatic workflow in the HPC with singularity and miniwdl. But there are some errors need to be figure out. For this workflow, it can output Whole-genome copy number profile, Small variants (SNV/INDEL) coverage and variant allele frequency (VAF) distribution, Mutational signatures, Table of filtered SNVs/INDELs, Table of filtered SVs and Differentially methylated regions overlapping with promoters. Its input file include tumor.bam and normal.bam for each patients. And it provide two big datasets : full COLO829 and HCC1395 dataset . COLO829 (60X tumor, 60X normal): https://downloads.pacbcloud.com/public/revio/2023Q2/COLO829 HCC1395 (60X tumor, 40X normal): https://downloads.pacbcloud.com/public/revio/2023Q2/HCC1395
6.04
Roughly used the LaTeX format and synchronized it to GitHub
Try to run a hifisomatic workflow in the NGS cluster with singularity and miniwdl. I move the Conda's pkgs to /data Hard Disk resolving the issue of lack of space in home directory. In NGS cluster, I can run singularity. But there are still some errors . docker: "quay.io/pacbio/purple@sha256:8f9a70a1e3c6ee86b5cf41ec31fdc90c7cca744f35d84cd9caa997833245e61d" This image in quay.io is not working, so I plan to try other platforms to search such images. Therefore, the subsequent CNV analysis also cannot work.
6.12 Last week the problem I met : unauthorized docker image named purple "quay.io/repository/pacbio/purple" now is authorized, so I pulled the docker image successfully.
The cnv task had some problems. My reference genome fasta is already indexed. cnkit.py is attempting to re-create this file and failing. I searched for many solutions, but I still couldn't solve the problem until I found an issue on GitHub's cnvkit issues page that had the same issue as mine. The solution : Updating the timestamp did the trick.
Then I met the problem: No space left on device. I checked the hard disk usage and found that indeed the /share/ directory had no space left. So, I copied all the data to the /data/ directory to try again.
6.19 I ran the complete COLO829 dataset through this pipeline. When calling small variants, I initially used the deep somatic software, but it was computationally expensive and ran for almost two days, still resulting in errors for two contigs. Therefore, I switched to ClairS software, which is more time-efficient compared to deep somatic, but its image was difficult to pull, and I had to try multiple times to successfully do so. The ClairS software is still running. The ClairS software is still running.
7.4 I successfully ran the complete HCC1395 (60X tumor, 40X normal) and COLO829 (60X tumor, 60X normal) datasets through the pipeline and got their reports.
Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer