cuhk-haosun / course-MBI6013

Material for Msc. research project MBI6013
GNU General Public License v3.0
0 stars 2 forks source link

The analysis pipeline for Pacbio sequencing data analysis and its applications #5

Open XH-BIM opened 6 months ago

sunhaocuhk commented 6 months ago

Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer

XH-BIM commented 6 months ago

Phase I Build the analysis pipeline for Pacbio sequencing data analysis for WGS, RNA and DNA methylation with docker image and cluster computing.

XH-BIM commented 4 months ago

2024.05.28 Try to run a hifisomatic workflow in the HPC with singularity and miniwdl. But there are some errors need to be figure out. For this workflow, it can output Whole-genome copy number profile, Small variants (SNV/INDEL) coverage and variant allele frequency (VAF) distribution, Mutational signatures, Table of filtered SNVs/INDELs, Table of filtered SVs and Differentially methylated regions overlapping with promoters. Its input file include tumor.bam and normal.bam for each patients. And it provide two big datasets : full COLO829 and HCC1395 dataset . COLO829 (60X tumor, 60X normal): https://downloads.pacbcloud.com/public/revio/2023Q2/COLO829 HCC1395 (60X tumor, 40X normal): https://downloads.pacbcloud.com/public/revio/2023Q2/HCC1395 屏幕截图 2024-05-28 131139

XH-BIM commented 3 months ago

6.04 图片 图片

Roughly used the LaTeX format and synchronized it to GitHub

Try to run a hifisomatic workflow in the NGS cluster with singularity and miniwdl. I move the Conda's pkgs to /data Hard Disk resolving the issue of lack of space in home directory. In NGS cluster, I can run singularity. But there are still some errors . 图片 docker: "quay.io/pacbio/purple@sha256:8f9a70a1e3c6ee86b5cf41ec31fdc90c7cca744f35d84cd9caa997833245e61d" 图片 This image in quay.io is not working, so I plan to try other platforms to search such images. Therefore, the subsequent CNV analysis also cannot work.

XH-BIM commented 3 months ago

6.12 Last week the problem I met : unauthorized docker image named purple "quay.io/repository/pacbio/purple" now is authorized, so I pulled the docker image successfully.

The cnv task had some problems. 图片 My reference genome fasta is already indexed. cnkit.py is attempting to re-create this file and failing. I searched for many solutions, but I still couldn't solve the problem until I found an issue on GitHub's cnvkit issues page that had the same issue as mine. The solution : 图片 Updating the timestamp did the trick.

Then I met the problem: No space left on device. 图片 I checked the hard disk usage and found that indeed the /share/ directory had no space left. So, I copied all the data to the /data/ directory to try again.

XH-BIM commented 3 months ago
simple_workflow_diagram
XH-BIM commented 3 months ago

6.19 I ran the complete COLO829 dataset through this pipeline. When calling small variants, I initially used the deep somatic software, but it was computationally expensive and ran for almost two days, still resulting in errors for two contigs. Therefore, I switched to ClairS software, which is more time-efficient compared to deep somatic, but its image was difficult to pull, and I had to try multiple times to successfully do so. The ClairS software is still running. 图片 图片 The ClairS software is still running. 图片

XH-BIM commented 2 months ago

7.4 I successfully ran the complete HCC1395 (60X tumor, 40X normal) and COLO829 (60X tumor, 60X normal) datasets through the pipeline and got their reports.