epifluidlab / FinaleMe

MIT License
9 stars 2 forks source link

where the *WGS.FinaleMe.mincg7.mergerd.cov.37.bw and autosome_1kb_intervals.UCSC.cpgIsland_plus_shore.b37.bed file if from? #6

Closed biwdpang closed 3 weeks ago

biwdpang commented 6 months ago

Hi, it is a good work!

I hvae done the step1-4 sucessful, but what is the *WGS.FinaleMe.mincg7.mergerd.cov.37.bw. I can't see the file from where.

"ls *WGS.FinaleMe.mincg7.merged.cov.b37.bw | perl -ne 'chomp;$cov=$_;$m=$cov;$m=~s/cov/methy_count/;print " -bigWig $m -useMean0 0 -regionMode 0 -bigWig $cov -useMean0 0 -regionMode 0";' >> cfdna.methy_summary.cmd.txt"

And, can perl open the bigwig file directly?

Addition, where can i get the autosome_1kb_intervals.UCSC.cpgIsland_plus_shore.b37.bed file about next step

"perl -e '$cmd=cat cfdna.methy_summary.cmd.txt;chomp($cmd); java -Xmx10G -cp "lib/dnaaseUtils-0.14-jar-with-dependencies.jar:lib/java-genomics-io.jar:lib/igv.jar" main.java.edu.mit.compbio.utils.AlignMultiWigInsideBed autosome_1kb_intervals.UCSC.cpgIsland_plus_shore.b37.bed output.add_value.methy.bed.gz $cmd;'"

heweihuang commented 5 months ago

May I ask how you ran the second step? I encountered an error when running the second step, could you provide the relevant executable file? Thank you.

TongZhou202103 commented 3 months ago

May I ask how you ran the second step? I encountered an error when running the second step, could you provide the relevant executable file? Thank you.

Did you encounter the error ‘0 is smaller than, or equal to, the minimum (0)’? Have you solved it?

biwdpang commented 3 months ago

Hi, this is the step1-3 running command.

Step 1: extract features from bam files for the training and decoding

step1_max_memory="20G"

cd $outdir &&\

java -Xmx40G -cp "$FinaleMe/FinaleMe-0.58-jar-with-dependencies.jar:$FinaleMe/lib/gatk-package-distribution-3.3.jar:$FinaleMe/lib/sis-jhdf5-batteries_included.jar:$FinaleMe/lib/java-genomics-io.jar:$FinaleMe/lib/igv.jar" \ org.cchmc.epifluidlab.finaleme.utils.CpgMultiMetricsStats \ $hg38_2bit $cpgsite $cpgsite $input_bam $outdir/CpgMultiMetricsStats.hg38.details.bed.gz \ -stringentPaired -excludeRegions $mash_dark_region_hg38_bed -valueWigs methyPrior:0:$wgbs_buffycoat_hg38_bw -wgsMode &&\

Step 2: train the model

step2_max_memory="100G"

java -Xmx40G -cp "$FinaleMe/FinaleMe-0.58-jar-with-dependencies.jar:$FinaleMe/lib/jahmm-0.6.2.jar" \ org.cchmc.epifluidlab.finaleme.hmm.FinaleMe \ $outdir/$filename.FinaleMe.mincg7.model $outdir/CpgMultiMetricsStats.hg38.details.bed.gz \ $outdir/$filename.FinaleMe.mincg7.prediction.bed.gz -miniDataPoints 7 -gmm -covOutlier 3 &&\

Step 3: decode and make the prediction of CpG methylation level

step_3_max_memory="100G"

java -Xmx40G -cp "$FinaleMe/FinaleMe-0.58-jar-with-dependencies.jar:$FinaleMe/lib/jahmm-0.6.2.jar" \ org.cchmc.epifluidlab.finaleme.hmm.FinaleMe \ $outdir/$filename.FinaleMe.mincg7.model \ $outdir/CpgMultiMetricsStats.hg38.details.bed.gz \ $outdir/$filename.FinaleMe.mincg7.prediction.bed.gz -decodeModeOnly &&\

elicj01 commented 2 months ago

Hi,

Initially, I assumed that *WGS.FinaleMe.mincg7.merged.cov.b37.bw referred to the step 4 outputs. However, these are my results from step 4: test.cov.b37.bw, test.methy.b37.bw, test.methy_count.b37.bw.

Are these outpus what we should expect? Could it be that we need to merge these three files in order to run step 5?

Regarding the autosome_1kb_intervals.UCSC.cpgIsland_plus_shore.b37.bed, I think it can be created using the information from the CpG islands, modifying the coordinates to include the shores (around +/- 2kb), and then dividing it into 1kb intervals.

biwdpang commented 2 months ago

Hi,

Initially, I assumed that *WGS.FinaleMe.mincg7.merged.cov.b37.bw referred to the step 4 outputs. However, these are my results from step 4: test.cov.b37.bw, test.methy.b37.bw, test.methy_count.b37.bw.

Are these outpus what we should expect? Could it be that we need to merge these three files in order to run step 5?

Regarding the autosome_1kb_intervals.UCSC.cpgIsland_plus_shore.b37.bed, I think it can be created using the information from the CpG islands, modifying the coordinates to include the shores (around +/- 2kb), and then dividing it into 1kb intervals.

You're right, it 's diffcult to understand the from step4 to end.

So, from step3 start i come true the result through https://github.com/nloyfer/meth_atlas.

robinycfang commented 1 month ago

Hi,

Initially, I assumed that *WGS.FinaleMe.mincg7.merged.cov.b37.bw referred to the step 4 outputs. However, these are my results from step 4: test.cov.b37.bw, test.methy.b37.bw, test.methy_count.b37.bw.

Are these outpus what we should expect? Could it be that we need to merge these three files in order to run step 5?

Regarding the autosome_1kb_intervals.UCSC.cpgIsland_plus_shore.b37.bed, I think it can be created using the information from the CpG islands, modifying the coordinates to include the shores (around +/- 2kb), and then dividing it into 1kb intervals.

Thanks for your advice, that finally works. However, I am again stuck at step 4, ran the cmd but nothing came out.

Looked at the final R script for deconv, it seems works only for hg19, needs some edits to make it work for hg38...

elicj01 commented 1 month ago

Hi, Initially, I assumed that *WGS.FinaleMe.mincg7.merged.cov.b37.bw referred to the step 4 outputs. However, these are my results from step 4: test.cov.b37.bw, test.methy.b37.bw, test.methy_count.b37.bw. Are these outpus what we should expect? Could it be that we need to merge these three files in order to run step 5? Regarding the autosome_1kb_intervals.UCSC.cpgIsland_plus_shore.b37.bed, I think it can be created using the information from the CpG islands, modifying the coordinates to include the shores (around +/- 2kb), and then dividing it into 1kb intervals.

Thanks for your advice, that finally works. However, I am again stuck at step 4, ran the cmd but nothing came out.

Looked at the final R script for deconv, it seems works only for hg19, needs some edits to make it work for hg38...

Hi, you might need to check the bedpredict2bw.b37.pl file, as it retrieves the chromosome sizes from this path: /jet/home/shared/data/genomes/.... I think that changing it to the path where you have the chromosome sizes should make it work.

About the R script, I have not checked it out yet, but I'll be glad if you let me know whether you are able to make it work

biwdpang commented 1 month ago

Hi, Initially, I assumed that *WGS.FinaleMe.mincg7.merged.cov.b37.bw referred to the step 4 outputs. However, these are my results from step 4: test.cov.b37.bw, test.methy.b37.bw, test.methy_count.b37.bw. Are these outpus what we should expect? Could it be that we need to merge these three files in order to run step 5? Regarding the autosome_1kb_intervals.UCSC.cpgIsland_plus_shore.b37.bed, I think it can be created using the information from the CpG islands, modifying the coordinates to include the shores (around +/- 2kb), and then dividing it into 1kb intervals.

Thanks for your advice, that finally works. However, I am again stuck at step 4, ran the cmd but nothing came out. Looked at the final R script for deconv, it seems works only for hg19, needs some edits to make it work for hg38...

Hi, you might need to check the bedpredict2bw.b37.pl file, as it retrieves the chromosome sizes from this path: /jet/home/shared/data/genomes/.... I think that changing it to the path where you have the chromosome sizes should make it work.

About the R script, I have not checked it out yet, but I'll be glad if you let me know whether you are able to make it work

Hahh, you're right. Acutally, i have been finished this step, but is have enough file to come true next step. So, i think this is no mes. As i said below, i had other way to work.

dnaase commented 3 weeks ago

I apologize for the delayed response...

"ls *WGS.FinaleMe.mincg7.merged.cov.b37.bw | perl -ne 'chomp;$cov=$_;$m=$cov;$m=~s/cov/methy_count/;print " -bigWig $m -useMean0 0 -regionMode 0 -bigWig $cov -useMean0 0 -regionMode 0";' >> cfdna.methy_summary.cmd.txt"

This is a perl cmd to collect all the bigwig files in the same directory

"perl -e '$cmd=cat cfdna.methy_summary.cmd.txt;chomp($cmd); java -Xmx10G -cp "lib/dnaaseUtils-0.14-jar-with-dependencies.jar:lib/java-genomics-io.jar:lib/igv.jar" main.java.edu.mit.compbio.utils.AlignMultiWigInsideBed autosome_1kb_intervals.UCSC.cpgIsland_plus_shore.b37.bed output.add_value.methy.bed.gz $cmd;'"

This used the txt file generated in the previous step to summarize the methylation level in each sample to a matrix used for Tissue-of-origin analysis in R.

Here is how i generated file: i downloaded UCSC.cpgIsland annotation file from UCSC genome browser. Only keep the ones in autosomes. then generate 1kb non-overlapped windows by this perl script: cat UCSC.cpgIsland.20190503.b37.autosomes.merged.bed | perl -ne 'chomp;@f=split "\t";$w=1000;for($s=$f[1];$s<$f[2];$s+=$w){$e=$s+$w;if($e>$f[2]){$e=$f[2];}print "$f[0]\t$s\t$e\n";}' > UCSC.cpgIsland.20190503.b37.autosomes.1kb_intervals.bed