PGScatalog / pgsc_calc

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation
https://pgsc-calc.readthedocs.io/en/latest/
Apache License 2.0
107 stars 20 forks source link

Make reports shareable by default #367

Open nebfield opened 1 week ago

nebfield commented 1 week ago

Description of feature

The report.html is so useful but I can’t share it easily because it shows some of the PGS scores with sample IDs in the head of the score data tibble.

We could:

1) Remove sample ID column from the tibble or 2) Just delete the tibble (probably safest)

bnwolford commented 1 week ago

I believe it depends on the Biobank whether the PGS value without the pseudoynimized ID is considered summary-level information and less subject to data sharing restrictions. The summary plot should be considered safe summary level data enough as long as the x-axis precision and number of markers of the score are such that one cannot easily work backwards possible genotype combinations from the weights file.

smlmbrt commented 1 week ago

I think we'll just go with option 2 as it avoids the issue altogether. Do you think that lines on top of a distribution is ok for displaying the results of few (https://github.com/PGScatalog/pgsc_calc/issues/345)? That would have low precision as you suggest.

bnwolford commented 1 week ago

I think the lines are ok as long as the x-axis has low precision. I could see there technically being an issue if we consider that a summary statistic on less than 5 or 30 people or whatever the minimum is specified in an ethics application. But the histogram is a really great sanity check so I wouldn't want to remove that feature. Perhaps it could just be toggled on if people have issue with it?