cuhk-haosun / course-MBI6013

Material for Msc. research project MBI6013
GNU General Public License v3.0
0 stars 2 forks source link

Develop the deep learning model for cell free DNA analysis #7

Open 223050025 opened 7 months ago

223050025 commented 7 months ago

Talking with Dr.Sun to ensure the research topic, read the paper "DNA methylation analysis explores the molecular basis of plasma cell-free DNA fragmentation. Nature Communications, 14(287), https://doi.org/10.1038/s41467-023-35959-6".
Run the code with public dataset first. The final goal is to try to use deep learning model to classify and analyse the cell-free DNA dataset, and build the docker.

223050025 commented 7 months ago
  1. Public cfDNA whole genome sequencing datasets: GSE71378, GSE124686, GSE81314
  2. WGBS dataset: CRA001537
Milokita commented 7 months ago
  1. Public cfDNA whole genome sequencing datasets: GSE71378, GSE124686, GSE81314

    1. WGBS dataset: CRA001537

pls specify the save path of your data

223050025 commented 6 months ago

Try to use SRA ToolKit to download fastq files. As example, GSE71378's raw file id is SRR061633, comment "fastq-dump --split-files SRR061633" will split two original paired-end reads (paired sequencing sequences) into two files, with the first and second sequences of each paired reads stored separately.

截屏2024-04-22 21 59 39 截屏2024-04-22 21 59 47

The data is saved by /share/home/grp-sunhao/liyixiao/SRR061633_1.fastq

223050025 commented 6 months ago
截屏2024-04-30 11 34 17

@Milokita 师兄,请问如何下载EGA数据库里的数据

Milokita commented 6 months ago

In short, if it's controlled dataset, you need to write an application otherwise you can simply download it

223050025 commented 5 months ago

rp.pptx May 25, 2024 Update, pls see the 2024.5.28 in ppt.

223050025 commented 5 months ago

The request access for EGAD00001000856 is submitted, I am waiting the reply from CUHK Circulating Nucleic Acids Research Group. And then I will submit the Signed Policy for Plasma DNA data sharing by following the reply email.

223050025 commented 5 months ago

Modify of Tamplatet-Latex, add background part from proposal with reference as example of create a bibliography. This project is synced with the GitHub repository at 223050025/LiYixiao-template

223050025 commented 5 months ago

I meet the error with message: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'gradients/split_2_grad/concat/split_2/split_dim' with dtype int32 [[{{node gradients/split_2_grad/concat/split_2/split_dim}}]] How do I solve this problem. @Milokita

223050025 commented 5 months ago

I run the code in an online platform Kaggle, the problem is disappear. But when I run the code on Jupyter notebook in my own computer, will meet the error above.

223050025 commented 5 months ago

rp.pptx June 3, 2024 Update, pls see the 2024.6.4 in ppt.

223050025 commented 5 months ago

rp.pptx June 18, 2024 Update, pls see the 2024.6.18 in ppt.

Milokita commented 5 months ago

not to use this to submit file, it may lost during transit

223050025 commented 4 months ago

when use bowtie2-build on HPC: error: /usr/bin/env: python3: No such file or directory

223050025 commented 4 months ago

@Milokita

Milokita commented 4 months ago

pls open new issue for your problem and append detail info realated to your question. This issue is for progress report ONLY