AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
101 stars 67 forks source link

change to kf-openaccess-us-east-1-prd-pbta/data #1121

Closed kgaonkar6 closed 3 years ago

kgaonkar6 commented 3 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

v20 CI testing files check

What was your approach?

Changed the s3 bucket path to https://s3.amazonaws.com/kf-openaccess-us-east-1-prd-pbta/data and copied testing_v20.zip files to testing folder

What GitHub issue does your pull request address?

1048

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Add Shatterseek errored out here https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1121/commits/e6028f7ef7975ecdc790ada77a55874b84566683

[1] "Running: BS_20TBZG09 (4 of 17)"
Error in if ((seg1$chrom == seg2$chrom) & (seg1$copy.num == seg2$copy.num)) { : 
  missing value where TRUE/FALSE needed
Calls: mergeCNsegments -> compare_adjacent_segs
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

so I commented it out to run CI.

Will try to fix below

kgaonkar6 commented 3 years ago
[1] "Running: BS_20TBZG09 (4 of 17)"
Error in if ((seg1$chrom == seg2$chrom) & (seg1$copy.num == seg2$copy.num)) { : 
  missing value where TRUE/FALSE needed
Calls: mergeCNsegments -> compare_adjacent_segs
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted

It seems like the code errors because the test cnv consensus data only has 1 row for BS_20TBZG09 which I tried to fix it with https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1121/commits/a273dcc7756ac091f24010e6a3adddd74205bb34

cnv_consensus %>% filter(ID %in% "BS_20TBZG09",!is.na(cnv_consensus$copy.num)) 
# A tibble: 1 x 7
  ID          chrom loc.start   loc.end num.mark seg.mean copy.num
  <chr>       <chr>     <dbl>     <dbl> <lgl>       <dbl>    <dbl>
1 BS_20TBZG09 chr12 132132000 132470393 NA          0.300        3
jaclyn-taroni commented 3 years ago

The tests currently pass and there are no changes, outside of those to the download data step, to the CircleCI config file?

kgaonkar6 commented 3 years ago

In addition to download data and circleCI config file, an update to 02-run-shatterseek-and-classify-confidence.R here https://github.com/AlexsLemonade/OpenPBTA-analysis/commit/a273dcc7756ac091f24010e6a3adddd74205bb34 was needed to pass CI checks

jaclyn-taroni commented 3 years ago

Yes, I saw that. I guess my point is that you didn't just try to solve the problem, you successfully solved the problem and that doesn't preclude merging this!

kgaonkar6 commented 3 years ago

Sorry for the confusion, I only added the 02-run-shatterseek-and-classify-confidence.R code update as a code review update for the CI to pass.

While rerunning the chromothripsis module I am erroring out in another place which I am still trying to figure out :

Evaluating the statistical criteria
Successfully finished!
[1] "Running: BS_JJNSP29S (102 of 777)"
Running..

Evaluating the statistical criteria
Error in seq.default(max(idxa), min(idxb), 1) : 
  'to' must be a finite number
Calls: shatterseek ... fmax2 -> nrow -> [ -> [.data.frame -> seq -> seq.default
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted
kgaonkar6 commented 3 years ago

Assistance from Chromothripsis Analysis authors @LauraEgolf @yangyangclover will be helpful, I couldn't figured out why the sample BS_JJNSP29S errors out of shatterseek in v20.

jaclyn-taroni commented 3 years ago

@kgaonkar6 is this a problem with this pull request, the CI files, or a separate issue that comes up when running the entire pipeline (e.g., not using tested files)? I am confused.

kgaonkar6 commented 3 years ago

The 02-run-shatterseek-and-classify-confidence.R with the update https://github.com/AlexsLemonade/OpenPBTA-analysis/commit/a273dcc7756ac091f24010e6a3adddd74205bb34 works on the testing data (since I figured out the issue was that code could not handle cnv_current dataframes that had only 1 row) which then passes the CI.

Now I am trying to run the chromothripsis module on v20 data release to update the module before we can merge to master and I get the issue below:

Evaluating the statistical criteria
Successfully finished!
[1] "Running: BS_JJNSP29S (102 of 777)"
Running..

Evaluating the statistical criteria
Error in seq.default(max(idxa), min(idxb), 1) : 
  'to' must be a finite number
Calls: shatterseek ... fmax2 -> nrow -> [ -> [.data.frame -> seq -> seq.default
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Execution halted
jaclyn-taroni commented 3 years ago

Does the output of that script go into the release or is it required for subtyping? If not, then it doesn’t preclude release in my opinion and we should file an issue that is separate from the release process.

kgaonkar6 commented 3 years ago

Ok sounds good, thanks for clarifying!

The outputs from chromothripsis are not released and not used in subtyping. I'll write up a different issue.