Grouped columns - Githubissues

SeaGenGosink commented 2 years ago

Is it possible to include an option for grouped columns? For example include either an additional argument for a tibble with as many rows as there are columns in the Prism data, and each column in the tibble is a categorical vector that groups the Prism columns? Thanks!

Yue-Jiang commented 2 years ago

Thanks, an option to handle grouped columns had previously been requested, but I haven't thought of this solution before. If I understand your suggestion correctly, I think it would work for Prism data that only contain Y values, but not those containing both X and Y, is that right?

SeaGenGosink commented 2 years ago

(Edit: trying to make the XML more readable in the GitHub window) Hmm, I haven't looked into the Prism xml specifications that deeply, but I'm looking at an example from one of my colleagues that has the form as per below. That is, within a "YColumn" element, there may be several "Subcolumn" elements each with multiple data point nodes.

etc.

Curiously, in the one example I looked at there seems to be two designations for the XColumn information; both a "XColumn" and an "XAdvancdColumn" as per below. Again, I don't know anything about the Prism XML format other than what I see here so I don't know if the double accounting for the x-axis information is normal...

SeaGenGosink commented 2 years ago

Hi Yue, let me know if you have any questions about this or if there is any way I could help. Thanks!

Yue-Jiang commented 2 years ago

Sorry for the delay, and thanks for the info. Indeed this is an interesting idea I haven't thought of before. However, I still struggle how to make your proposal generalizable - I think it will work in many cases by transposing the table and make the x column the column title in the new transposed table, however there are cases I don't think it's a good solution, for example the XY table with an X column and row titles. In that case, one would have to combine the row titles and X column values to make it the new column names. In general, I'm a little hesitant about manipulating the class of X column and about wrangling the original table (transposing and adding columns). I still think your desired output is probably best achieved by reading the table in, then manipulate it by dplyr/tidyr like approaches. As for the XAdvancedColumn, as far as I understand, the XAdvancedColumn is only useful when X is date, in which case it contains a character version of the date in supplement to the XColumn which is numeric. In other cases I haven't found it useful. But it's another example of transposing the table changes the original table too much - the current implementation is to allow people to choose whether they want numeric date or character date or both as columns. I'm leaning toward providing an option to output grouped columns as list-columns so people can unnest as they wish later on, but not sure how popular list columns are.

SeaGenGosink commented 2 years ago

Hmm, I think there are issues here I don't understand in terms of the intermediate code and data structures in your library, and/or the expected XML format. I see that the demo data you supply with the package (exponential_decay.pzfx) includes subcolumns (ie 3 controls and 3 treated) of data and the package can read it in with no problem, but it can't write out such a table?

To clarify, I'm only interested in writing Prism files, not reading them... at least for now :-)

Thanks again for all your work on this!

Yue-Jiang commented 2 years ago

I see! I think that could be doable, I'll try to figure something out when I find time. Thanks for the clarification!

nickp60 commented 1 year ago

We would also appreciate this option if its doable!

SeaGenGosink commented 1 year ago

@nickp60 (and also @Yue-Jiang), in fact @phoward38 and @zross have spent a fair bit of time working on this and related issues. I hired them as contractors to rebuild a shiny app I prototyped. We dead end forked a copy and mangled it quite a bit to get it to work for us on a tight timeline. Perhaps its time to give back to the community and merge what we have back in with the master?

I'm also having them work on writing data into multiple tables in the Prism file (each with user definable names), as well as writing mean, median, N, & SD type data. Oh, yeah, also survival data with censoring etc.

Sorry for working 'off the grid', we were in a rush and as I said, really had to do a hack job to get things going a while back. I'll touch base with Patrick and Zev about re-merging with the base... if that is OK with you @Yue-Jiang.

Yue-Jiang commented 1 year ago

@SeaGenGosink That would be awesome! Sorry I haven't really had any time for open source work over the past year. If your team would be willing to submit a PR I'll be happy to review it! Thanks so much!

BenjaminSchott commented 9 months ago

Hi everyone. Thanks for the great tool @Yue-Jiang! Has anyone managed to implement grouped tables support into write_pzfx?

SeaGenGosink commented 9 months ago

Yeah, we have a hard fork of the original code that permits grouped tables. Unfortunately the developers who were working on it merged the code into another library of code. Someone please advise how I can pull out the 2 blocks of code that are needed to do this and merge it into the original pzfx library here.

Yue-Jiang commented 9 months ago

Is the other library a public repo?

SeaGenGosink commented 9 months ago

Alas, no. Its 2 functions (improved write_pzfx and a helper function [with a weird data block apparently required by Prism]) which are in an internal library. Should I just email them to you?

Yue-Jiang commented 9 months ago

Sure that would work; I'll try to incorporate it. Thank you!

SeaGenGosink commented 9 months ago

Hi Yue,

See attached 2 files. One has the main function write_pzfx_grouped() and the other is 2 binary data objects wrapped in strings. Some comments:

I attempted to disentangle the code here as best I could from what the developers left for me. Quite frankly, it is hard for me to follow what they’ve done. It seems way more complicated than I might expect. It might take some refactoring.
There is a hidden requirement for columns labelled ‘experiment’ and ‘project’ in the data frame (tibble) that you are trying to write to Prism. Each experiment gets its own data table. I don’t think the ‘project’ does anything, but I’m not sure.
There is some bizarre requirement by GraphPad prism to have those two binary objects attached to the outgoing file. Thus the systdata.rda needs to be sourced into the local environment in order for the code to work. I don’t know why they did it that way, or why Prism even wants the binary objects.
The system support writing “raw” data as well as data where the means and standard deviations have been pre-computed. There are other precomputed stats that could be exported to Prism, but my developers didn’t get around to coding them.
The code also supports writing Kaplan-Meier type (“km-_survival”) data. There may be some hidden requirement there too.

Apologies for the crappy state of affairs, it took me an hour just to get it unentangled enough to send it to you here. Please contact me if you have any questions. @.**@.>

-John

From: Yue Jiang @.> Sent: Saturday, October 7, 2023 12:17 PM To: Yue-Jiang/pzfx @.> Cc: John Gosink @.>; Mention @.> Subject: Re: [Yue-Jiang/pzfx] Grouped columns (#11)

Sure that would work; I'll try to incorporate it. Thank you!

— Reply to this email directly, view it on GitHubhttps://github.com/Yue-Jiang/pzfx/issues/11#issuecomment-1751798606, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQJDTGLZIRAHTJBWNE2FKPTX6GTEPAVCNFSM5APSSEN2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZVGE3TSOBWGA3A. You are receiving this because you were mentioned.Message ID: @.**@.>>

Yue-Jiang commented 9 months ago

Thanks John, although the attachment doesn't seem to work here on github. Might want to email my personal email rivehill[at]gmail.com

Yue-Jiang commented 6 months ago

Might be of interest to people on this thread, starting from prism 10 it introduced a new file format .prism which is just a zipped folder of flat files. Might be the easier route to write into. https://www.graphpad.com/guides/prism/latest/user-guide/prism_file_format.htm

Yue-Jiang / pzfx

Grouped columns #11