TranslationalBioinformaticsIGTP / CNVbenchmarkeR

Framework to benchmark algorithms when detecting germline copy number variations (CNVs) from NGS data
MIT License
14 stars 3 forks source link

CNVbenchmarkeR: summary.R Execution halted #1

Closed fultonardo closed 4 years ago

fultonardo commented 4 years ago

Greetings!

I am using the ICR96 dataset to setup two algorithms, DECoN and panelcn.MOPS, for evaluation with CNVbenchmarkeR. I conducted preprocessing of the ICR96 dataset with HG19. CNVbenchmarkeR properly produces output for both programs, but fails to complete the summary.R script due to two different errors as described below. Any advice for troubleshooting these errors will be very valuable! If you need more details please let me know. Thank you, -Matt

Note: I have the two programs running through CNVbenchmarkeR setup on two different machines running Ubuntu 18.04/R3.4.4 (DECoN) and Ubuntu 20.04/R4.0.2.


DECoN: Ubuntu 18.04, R 3.4.4 Output- Upon completion of runBenchmark.sh the output folder contains the 6 output files expected from DECoN. CNVbenchmarkeR/output/decon-DECoN/calls.RData CNVbenchmarkeR/output/decon-DECoN/calls_all.txt CNVbenchmarkeR/output/decon-DECoN/failedROIs.csv CNVbenchmarkeR/output/decon-DECoN/failures_Failures.txt CNVbenchmarkeR/output/decon-DECoN/grPositives.rds CNVbenchmarkeR/output/decon-DECoN/output.bams.RData

Logs- decon.log and summary.log are produced in the logs folder. decon.log DECoN executes to completion with the following warning messages: Warning messages: 1: In .Seqinfo.mergexy(x, y) : Each of the 2 combined objects has sequence levels not in the other:

summary.log- complete output below: [1] "/home/cnv18/CNVbenchmarkeR"

[1] "algorithms.yaml" "datasets.yaml"

Loading DECoN Dataset validated results Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed Calls: -> colnames -> read.csv -> read.table In addition: Warning messages: 1: In read.table(file = file, header = header, sep = sep, quote = quote, : line 1 appears to contain embedded nulls 2: In read.table(file = file, header = header, sep = sep, quote = quote, : line 2 appears to contain embedded nulls 3: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string 4: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : embedded nul(s) found in input Execution halted


panelcn.MOPS Ubuntu 20.04, R 4.0.2 Output- Upon completion of runBenchmark.sh the output folder contains the 3 output files expected from panelcn.MOPS. CNVbenchmarkeR/output/panelcn-ICR_19/cnvFounds.txt CNVbenchmarkeR/output/panelcn-ICR_19/failedROIs.csv CNVbenchmarkeR/output/panelcn-ICR_19/grPositives.rds

Logs- panelcnmops.log and summary.log are produced in the logs folder. panelcnmops.log panelcn.Mops executes to completion with the following warning messages: There were 50 or more warnings (use warnings() to see the first 50) [1] "Finishing at 2020-10-08 11:06:11"

summary.log- complete output below: [1] "algorithms.yaml" "datasets.yaml"
Loading ICR_19 Dataset validated results Loading panelcn results for ICR_19 dataset Error in names(x) <- value : 'names' attribute [6] must be the same length as the vector [0] Calls: -> colnames<- -> colnames<- In addition: There were 50 or more warnings (use warnings() to see the first 50) Execution halted

jpuntomarcos commented 4 years ago

Hi Matt,

It seems you are obtaining the first error when calling ss$loadValidatedResults(). How does your results validation file look like? It should have the same format than the example that I provided.

The second error is a bit difficult to be understood without having the files you used. Did panelcn.MOPS produced correct results files?

Regards, José Marcos.

fultonardo commented 4 years ago

Greetings José,

Thank you for your willingness in helping me troubleshoot CNVbenchmarkeR!! If I can get it up and running with both DECoN and panelcn.MOPS it will save loads of time & effort!

While digging through the R scripts I noticed some of the input files are being called with the read.csv function. Therefore I decided to try using input files saved in different formats. Initially I had pulled the validation file for the ICR96 dataset as supplementary file 2 (from Moreno-Cabrera et al. 2020) which is provided as an .xlsx file.

panelcn.MOPS Ubuntu 20.04, R 4.0.2 Try #1) Validation file from downloaded file #S2 saved from .xlsx as a .csv (comma-delimited) file: panelcn.MOPS produces output, but CNVbenchmarkeR does not produce summary folder in output.

Try #2) Validation file from downloaded file #S2 saved as .csv (tab-delimited): panelcn.MOPS produces output & CNVbenchmarkeR produces summary folder containing 2 files: results_table.csv & summary.txt.
--current problem, I haven't had a chance to look too deeply into this yet-- panelcn.MOPS output folder contains file 'cnvfounds.txt' which contains CN1-deletions & CN3-duplications. However, results in file output>summary>results...csv reports 0 positive calls (0 TP, 0 FP, 0 sensitivity, 0 F1).

Here is a link to the input and output files from panelcn.MOPS Try #2 above: https://github.com/fultonardo/troublehshooting_CNVbenchmarkeR/blob/main/panelcn_output.zip https://github.com/fultonardo/troublehshooting_CNVbenchmarkeR/tree/main/panelcn_output


DECoN: Ubuntu 18.04, R 3.4.4 The DECoN algorithm will not execute with the bed file written as in the example, regardless of whether the validated.csv file is saved as comma-delimited or tab-delimited. However, if I change the chromosome column of the bed file from the chromosome number (1, 2, 3...) to (chr1, chr2, chr3...) then it will execute the DECoN algorithm. This produces the DECoN output files listed below, but does not produce the summary folder or files. calls.RData calls_all.txt failedROIs.csv failures_Failures.txt grPositives.rds output.bams.RData

Here are links to the input and output files from the DECoN run: https://github.com/fultonardo/troublehshooting_CNVbenchmarkeR/blob/main/decon-files.zip https://github.com/fultonardo/troublehshooting_CNVbenchmarkeR/tree/main/decon-DECoN_bedw_CHROM._valid-tab.csv

jpuntomarcos commented 4 years ago

Hi Matt,

I am happy that your "try 2" worked because you used a tab-delimited format. Please, remember to check the example I provided in order to follow the exact format: it has to be tab-delimited and have the same column names and order.

Now, your problem when generating the Summary is that you are providing different sample names in the cnvFounds.txt file (17296sorted, 17297sorted, etc) than in the validated_results_file (17296, 17297). A similar problem is happening when you execute the Summary for DECoN.

Also, your DECoN execution has another problem: you are using different chromosome names in the bed file than in the other files. I provide you the exact bed file I am using (is the one published, in a tab-delimited format): bed.bed.zip

Hope you can fix all your issues, José Marcos.

fultonardo commented 4 years ago

Greetings José,

Ahh, that makes sense! I will rename the samples and retry.

Thanks again for all of your help!! -Matt Marshall

fultonardo commented 4 years ago

Greetings José,

I wanted to follow up to let you know that I was able to get both panelcn.MOPS and DECoN working properly in CNVbenchmarker, thank you again for your help!!

Here are some useful tips that may help others that run into similar errors: I was able to get CNVbenchmarkeR to run panelcn.MOPS with R4.0.2 in Ubuntu 20 and the following input- 1- bed_file: must be saved as a tab-delimited .bed file --> Matches the example file exactly 2- validated_results_file: must be saved as a tab-delimited .txt file (This does not match the example file exactly because the example file in the examples folder is a .csv and the supplementary file provided in the publication is .xlsx) 3- Chromosome labels must be only a number, (e.g., 1, 2, 3,...)

I was able to get CNVbenchmarkeR to run DECoN with R 3.4.4 in Ubuntu 18 and the following input- 1- bed_file: must be saved as a tab-delimited .bed file --> does not match the example file: chromosome labels must be chr1, chr2, etc. 2- validated_results_file: must be saved as a tab-delimited .txt file and chromosome labels must be chr1, chr2, etc.

note- I got the idea to use chr1, chr2, etc. as opposed to 1, 2, etc. because the HG_19.fa file has chromosomes labeled as chr1, chr2, etc...


Finally, the exact input files that run properly on the panelcn.MOPS computer do not produce any output on the DECoN computer. DECoN does not even run past 'evaluating GC content...' line when chromosomes are not labeled as chr1, chr2, etc.

The exact input files that run properly on the DECoN computer do produce panelcn.MOPS output but do not produce the CNVbenchmarkeR summary files.

My guess is that the reason for this is the DECoN computer is running the older versions of everything, including Bioconductor and Bioconductor related packages, however I do not plan to look into this further unless it becomes relevant down the road!

Best wishes, -Matt

jpuntomarcos commented 3 years ago

Thanks for your feedback. I have done some udpates in the README file to better guide other users.

Best, José Marcos.