dmcblab / InfoGenomeR

9 stars 6 forks source link

SV_performance.R error #1

Closed xc611 closed 2 years ago

xc611 commented 3 years ago

Hi, Try to run your demo1 data:

Rscript SV_performance.R manta.format true_SV_sets_somatic 0 0 Error in if (mode == "precision") { : missing value where TRUE/FALSE needed Execution halted

Thanks

Jack

dmcblab commented 3 years ago

Hello,

Thank you for your comment. We are working on updating the current demo version that may have some issues. Please catch up on the update soon later.

you can get precision, recall, and F-measure by following commands. Rscript SV_performance.R manta.format true_SV_sets_somatic 0 0 precision 0 Rscript SV_performance.R manta.format true_SV_sets_somatic 0 0 recall 0 Rscript SV_performance.R manta.format true_SV_sets_somatic 0 0 fmeasure 0

xc611 commented 3 years ago

Thanks for the help.

I have another question, to run your breakpoint demo1, your command line is:

./breapkpoint_graph/breakpoint_graph.sh somatic sample1 sample1 2 4 bicseq_norm bicseq_norm_germ copy_numbers.control hg19 5 null null 0

  1. what is sample1? tumor sample name?
  2. bicseq_norm, bicseq_norm_germ folder? your demo1 has different names for them, cp_norm.
  3. hg19 just a name or actual sequence like hg19.fa

your demo2 has the same issue, the folder name is different from your command line.

Please help, we really want to use your tool, already spend lots of time installed your tool, and even debugged your code for some of the inconsistency, however there is still confusion in the instruction.

Thanks again

Jack

dmcblab commented 3 years ago

Hello,

  1. The first "sample1" is a tumor sample name that you want to assign. It could be "tumor1", "sample1" or etc. The second "sample1" is a tumor type argument that is passed to ABSOLUTE. Please see the Parameters section of ABSOLUTE, https://www.genepattern.org/modules/docs/ABSOLUTE. It could be "BRCA", "GBM", or etc. The demo1 is a simulated genome, so just type the sample name or NA.

  2. bicseq_norm is a directory that contains binned copy number files from a tumor by BIC-seq2. bicseq_norm_germ is from a normal. For the demo1, the command should be ./breapkpoint_graph/breakpoint_graph.sh somatic sample1 sample1 2 4 cn_norm cn_norm_germ copy_numbers.control hg19 5 null null 0

  3. hg19 is just a prefix of hg19.fa. If you have hg19.fa, just type hg19. Note that the current demo doesn't work with a reference genome with a ".fasta" suffix, which will be fixed soon.

Feel free to ask any question until we release a user-friendly version. Thanks

xc611 commented 3 years ago

Thanks, what is the "NPE_dir" in your script? You didn't sign this parameter in the command line and no explanation of it, it takes the next "tumor_bin_dir" as its parameter, and all parameters are messed up.

dmcblab commented 3 years ago

Thanks again!

You're right. we missed the "NPE_dir" parameter. The NPE_dir should be the directory that contains NPE.fq1 and NPE.fq2 (non-properly paired reads extracted from the tumor bam). For the demo1, the NPE_dir is "./demo1" itself.

xc611 commented 3 years ago

Thanks for the information, I am running demo1 now. One more question, what tool you used to generate copy_number control file? It should come from one of the SV tools, right?

dmcblab commented 3 years ago

Hello, For the somatic mode, the copy_numbers.control file should be generated by BIC-seq2 or InfoGenomeR from a germline genome. We recommend to use InfoGenomeR to generate the copy_numbers.control file.

The following procedure is required before running the somatic mode.

1) cat germline_delly.format germline_manta.format > SVs

2) ./breapkpoint_graph/breakpoint_graph.sh germline control1 control1 2 4 null cn_norm_germ cn_norm_germ null hg19 5 null null 0

This will generate a "copy_numbers" file, and rename it as "copy_numbers.control". Fro the current demo1, we only provided copy_numbers.control, but not germline_delly.format and germline_manta.format.

The tutorial for the germline mode will be updated with the germline SV files.

dmcblab commented 3 years ago

You may need to check first, all the files are generated successfully In the each interation folder (iter1, iter2, ... )

ABSOLUTE_output ABSOLUTE_output/output/reviewed/sample1.test.ABSOLUTE.table.txt CNV.output CNV.output.somatic_format configFile copy_numbers copy_numbers_ABSOLUTE_input copy_numbers_ABSOLUTE_input.negative_marker copy_numbers.CN_opt copy_numbers.CN_opt.debug SVs IP.log new_debugSVs. CN_opt SVs.CN_opt.filtered

The error line Error in if (abs(new$modal_cn[i] - new$raw_expected_cn[i]) > 1) { : missing value where TRUE/FALSE needed  is generated from the integer_programming.R script.

I suspect two possibilities,

1)  some of these files (copy_numbers_ABSOLUTE, CNV_output, or ABSOLUTE_output/*) were not generated successfully, so the script, integer_programming.R was not working. In this case, please check BIC-seq2 is properly installed and redirected in the breakpoint_graph.sh script. Secondly, please check library(ABSOLUTE) is properly installed in R. ABSOLUTE_output/output/reviewed/sample1.test.ABSOLUTE.table.txt file should be generated.

2) If the files, IP.log, new_debug, or copy_numbers.CN_opt were missed, integer_programming.R was not working.   please check library(lpSolve) and library(lpSolveAPI) are properly installed in R. 


Sorry, one more question, in your breakpoint_graph.sh script, around line 99, there is "test1=cat SVs.CN_opt.filtered | wc -l", wondering where does the "SVs.CN_opt.filtered" file was generated.

Also, can you point me where is this line: if (abs(new$modal_cn[i] - new$raw_expected_cn[i]) > 1) { :

I will try to debug there.

I am running the first part of breakpoint_graph.sh script, 1 - 100 lines.

this is error:

Error in if (abs(new$modal_cn[i] - new$raw_expected_cn[i]) > 1) { :

missing value where TRUE/FALSE needed

Execution halted

cat: SVs.CN_opt.filtered: No such file or directory

cp: cannot stat ‘SVs.CN_opt.filtered’: No such file or directory

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

xc611 commented 3 years ago

Great, thanks a lot, I will try out some real data tumor vs normal.

xc611 commented 3 years ago

Hi,

Yes, we installed all the packages and tools, however there are still number of errors in your script.

One request, can you please use one of your simulated data, say, https://zenodo.org/record/4545666/files/GRCh38.diploidy.f10.p0.6.tar.xz

and use your github script, not the working script on your computer, to run through the breakpoint_graph.sh pipeline, and share the corrected script to us? Please.

Thanks

Jack

dmcblab commented 3 years ago

Yes, definitely. We will check the script in an another computer, and release the proper version. Please allow us a few days. Thanks

xc611 commented 3 years ago

Thank you so much


From: dmcb @.> Sent: Wednesday, July 7, 2021 8:11:43 PM To: dmcblab/InfoGenomeR @.> Cc: xc611 @.>; Author @.> Subject: Re: [dmcblab/InfoGenomeR] SV_performance.R error (#1)

Yes, definitely. We will check the script in an another computer, and release the proper version. Please allow us a few days. Thanks

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dmcblab/InfoGenomeR/issues/1#issuecomment-876040176, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE7IH3LXNIM4MFFXG57DE3DTWT3M7ANCNFSM47AU34ZQ.

dmcblab commented 3 years ago

Thank you for waiting.

Now, InfoGenomeR v1.0.1 is available, which will be more user-friendly. We specified tool versions (such as R 3.4.3). We did not checked yet other versions can be used (R 3.6.3 was fine, and other recent versions of bwa, blat, and blastn may be okay).

We added some tutorials; 1)tutorial 1 for Germline mode (breakpoint_graph -m germline) and somatic mode (breakpoint_graph -m somatic). 2) tutorial 2 for https://zenodo.org/record/4545666/files/GRCh38.triploidy.f10.p0.75.tar.xz. Of course, you may choose https://zenodo.org/record/4545666/files/GRCh38.diploidy.f10.p0.6.tar.xz as you select above.

Tutorial 3 will be uploaded soon for simplification mode (breakpoint_graph -m simplification), finding SV clusters, and reconstructing karyotypes with Eulerian path finding.

chenxf611 commented 3 years ago

Hi, Sorry for the delay, I got this error:

[1] "removing 48 / 49 modes outside of alpha/tau range." [1] "Optimizing LL(data, theta.qz, sigma.h | comb) for 1 modes: " SSSSSSSSSS sigma.h.hat=0.01946 [1] "Evaluating subclonal SCNAs in 1 purity/ploidy modes: " [1] "Switching to total copy Karyotype model..." [1] "Switching to total copy Karyotype model..." .There were 50 or more warnings (use warnings() to see the first 50) pass Ploidy and purity estimation is done Integer programming... Error in fread(paste(Sys.getenv("InfoGenomeR_lib"), "/humandb/", Sys.getenv("Ref_version"), : File '/humandb/.repeatmasker' does not exist or is non-readable. getwd()=='/chenx3/infor2/GRCh38.diploidy.f10.p0.6/InfoGenomeR_job/iter1'

Just wondering if the .repeatmasker is regular repeatmasker?

Thanks

Jack

dmcblab commented 3 years ago

Hello, Please check whether the Ref_version variable is assigned (export Ref_version=GRCh38). Then, please check whether GRCh38.repeatmasker exists in $InfoGenomeR_lib/humandb/. It is just a regular repeatmasker, downloaded from the ucsc table browser, https://genome.ucsc.edu/cgi-bin/hgTables. Or, you can get https://zenodo.org/record/5112761/files/GRCh38.repeatmasker.xz. The file name should be "GRCh38.repeatmasker".

Thanks

chenxf611 commented 3 years ago

Hi,

I followed your Tutorial 2 step by step, started from fresh, download your git and setup the env everything, however, at the end of iter1, I got the error I had it before:

Ploidy and purity estimation is done Integer programming... Error in if (abs(new$modal_cn[i] - new$raw_expected_cn[i]) > 1) { : missing value where TRUE/FALSE needed Execution halted

I checked your post for this issue, you had two suggestions for me (see above your post), I checked:

  1. ABSOLUTE_output/output/reviewed/sample1.test.ABSOLUTE.table.txt file is there.
  2. two libraries were installed in R: Type 'q()' to quit R.

library(lpSolve) library(lpSolveAPI)

what else I missed here?
Were you able to finish the iter1 without any problem by using your Tutorial 2 steps?

Thanks

Jack

dmcblab commented 3 years ago

Hello,

Yes, the iter1 should be finished without that error. I uploaded the iter1 folder, and please check the following command works without the error.

please check your settings

export InfoGenomeR_lib=/home/dmcblab/InfoGenomeR export PATH=$InfoGenomeR_lib/breakpoint_graph:$InfoGenomeR_lib/allele_graph:$InfoGenomeR_lib/haplotype_graph:$PATH export Ref_version=GRCh38 ls -l $InfoGenomeR_lib/humandb/GRCh38.repeatmasker ##################

wget http://gcancer.org/InfoGenomeR/iter1.tar.gz tar -xvf iter1.tar.gz cd iter1 Rscript integer_programming.R sample1 copy_numbers_ABSOLUTE_input.negative_marker SVs 100 T > IP.log

If that works, some steps were missed before integer programming. Please check the files in the iter1 folder are the same with them in your iter1 folder. In addition, would you please send your iter1 folder to my email (qlalf1457@gist.ac.kr) ? I will check what is wrong.

Thanks