Issue in gc.correct.wgs

quentinba commented 4 years ago

Hi, I just installed battenberg and try to run in on my WGS data. Unfortunately I had some issues during the gc.correct.wgs R function process.

After investigation, I noticed that since Battenberg v2.2.9, the gc.correct.wgs take a new argument "replic_timing_file_prefix" with no default value, and isn't handled by the perl wrapper.

This result in an incomplet processing of data.

Moreover, I'm not sure the X chromosome renaming part in https://github.com/cancerit/cgpBattenberg#prerequisites is still relevant.

Finally, I had to rename chromosomes following this guide https://github.com/cancerit/cgpBattenberg/wiki/Reference-name-conventions, however it should also give information for renaming the chromosomes names in CG correction files.

I personally used this :

for f in /dir/to/battenberg_wgs_gc_correction_1000g_v3/*; do cp "$f" "$f~"; gzip -cd "$f~" | awk -v OFS=$'\t' '{ $2="chr" $2; print}' | sed "s/chrPosition/Position/g" | gzip > "$f"; rm $f~; done

Regards, Quentin

rulixxx commented 4 years ago

Hi Quentin,

Are you sure you used the most recent release from cgpBattenberg? I updated the code about a week ago so that the wrapper is now using the most recent release of Battenberg.

I made some changes to the wrapper (some related to replic_timing_file_prefix) see: https://github.com/cancerit/cgpBattenberg/compare/3.6.0...3.7.0 . I have ran that new version successfully with several pairs of samples.

The chromosome thing has been changed for new versions now in order to accomodate references with or without the chr prefix.

From: quentinba notifications@github.com Sent: Thursday, October 15, 2020 12:17 PM To: cancerit/cgpBattenberg cgpBattenberg@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [cancerit/cgpBattenberg] Issue in gc.correct.wgs (#121)

Hi, I just installed battenberg and try to run in on my WGS data. Unfortunately I had some issues during the gc.correct.wgs R function process.

After investigation, I noticed that since Battenberg v2.2.9, the gc.correct.wgs take a new argument "replic_timing_file_prefix" with no default value, and isn't handled by the perl wrapper.

This result in an incomplet processing of data.

Moreover, I'm not sure the X chromosome renaming part in https://github.com/cancerit/cgpBattenberg#prerequisites is still relevant.

Finally, I had to rename chromosomes following this guide https://github.com/cancerit/cgpBattenberg/wiki/Reference-name-conventions, however it should also give information for renaming the chromosomes names in CG correction files.

I personally used this : for f in /dir/to/battenberg_wgs_gc_correction_1000g_v3/*; do cp "$f" "$f~" gzip -cd "$f~" | awk -v OFS=$'\t' '{ $2="chr" $2; print}' | sed "s/chrPosition/Position/g" | gzip > "$f" rm $f~ done

Regards, Quentin

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/cancerit/cgpBattenberg/issues/121, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEAAUCBN3YAINTJNS6NDS5TSK3D2LANCNFSM4SRZCBIA.

quentinba commented 4 years ago

Hi Raul, Thanks for your fast answer.

Indeed I'm using the 3.6.0 version of cgpBattenberg.

So I'll update the wrapper and come back to the last version of Battenberg. But with the 3.6.0 I still have issues with the chr prefixes as the output of GCwindowCorrelations_afterCorrection.txt is like that :

windowsize  correlation
windowsize  NA
correlation NA

So if my bam files contain chr prefix and my command line is : battenberg.pl -o $out_dir -r $ref_fai -tb $tumor_bam -nb $nontumor_bam -e impute_info_withCHR.txt -u bberg_ref_files/1000genomesloci -c probloci.txt -t 16 -ig ignore_contigs.txt -gc-correction-loc bberg_ref_files/battenberg_wgs_gc_correction_1000g_v3 -gender XX I'm not sure to understand which input data I should rename. I suppose that I must rename 1000genomeloci files and probloci.txt as recommended in https://github.com/cancerit/cgpBattenberg/wiki/Reference-name-conventions. But should I also need to rename de files contained in battenberg_wgs_gc_correction_1000g_v3 folder and the impute_info.txt file?

cancerit / cgpBattenberg

Issue in gc.correct.wgs #121