Open nebfield opened 1 week ago
I believe it depends on the Biobank whether the PGS value without the pseudoynimized ID is considered summary-level information and less subject to data sharing restrictions. The summary plot should be considered safe summary level data enough as long as the x-axis precision and number of markers of the score are such that one cannot easily work backwards possible genotype combinations from the weights file.
I think we'll just go with option 2 as it avoids the issue altogether. Do you think that lines on top of a distribution is ok for displaying the results of few (https://github.com/PGScatalog/pgsc_calc/issues/345)? That would have low precision as you suggest.
I think the lines are ok as long as the x-axis has low precision. I could see there technically being an issue if we consider that a summary statistic on less than 5 or 30 people or whatever the minimum is specified in an ethics application. But the histogram is a really great sanity check so I wouldn't want to remove that feature. Perhaps it could just be toggled on if people have issue with it?
Description of feature
We could:
1) Remove sample ID column from the tibble or 2) Just delete the tibble (probably safest)