AndrewC160 / ROMOPOmics

R package to parse datasets into SQLite-friendly tables
Other
4 stars 2 forks source link

detect need to transpose input #25

Open ngiangre opened 3 years ago

ngiangre commented 3 years ago

It should be defined whether input/masks need to be transposed - it shouldn't be left up to the user. It's not intuitive what the transpose flag should be...

AndrewC160 commented 3 years ago

Should we just establish "Input metadata tables should be oriented with one patient per column. In the event that a transposed table is to be used, the argument 'transpose_input_table' should be used?"

Otherwise detecting the orientation isn't super obvious...it could be a long format, or it could be a dataset with lots of fields.

ngiangre commented 3 years ago

I like having the input metadata conforming to a standard but we would need checks for the format. But if we can do a check and transpose then we don’t need the argument, just saying that the format should be either way. Does that make sense? I like having less arguments.

On Wed, Jan 20, 2021 at 8:07 PM Andrew Clugston notifications@github.com wrote:

Should we just establish "Input metadata tables should be oriented with one patient per column. In the event that a transposed table is to be used, the argument 'transpose_input_table' should be used?"

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/AndrewC160/ROMOPOmics/issues/25#issuecomment-764141684, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3TDSZSRLVMXSVS7HWHXETS2543VANCNFSM4VR47QCA .

-- Cheers,

Nick Giangreco Columbia University Systems Biology Ph.D. Student 716.713.2124

AndrewC160 commented 3 years ago

I agree, but what would we use to decide the orientation? If the user provides a CSV/TSV without knowable column/row names we'd need to figure out which are sample names and which are metadata values. If the dataset is sufficeintly large we could have a check for length/width, i.e. "Either they collected 30,000 metadata values or this table is long", but that's not a guarantee...especially with test data that we use and that people will probably feed in.

ngiangre commented 3 years ago

You’re right. So maybe we should keep the parameter but be explicit on the default format?

On Wed, Jan 20, 2021 at 8:34 PM Andrew Clugston notifications@github.com wrote:

I agree, but what would we use to decide the orientation? If the user provides a CSV/TSV without knowable column/row names we'd need to figure out which are sample names and which are metadata values. If the dataset is sufficeintly large we could have a check for length/width, i.e. "Either they collected 30,000 metadata values or this table is long", but that's not a guarantee...especially with test data that we use and that people will probably feed in.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/AndrewC160/ROMOPOmics/issues/25#issuecomment-764164113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3TDS5UTG5QGT2TG6EWIBTS26AEDANCNFSM4VR47QCA .

-- Cheers,

Nick Giangreco Columbia University Systems Biology Ph.D. Student 716.713.2124

AndrewC160 commented 3 years ago

Another option is to have the package figure it out when the user provides a mask ("Does row 1 or column 1 have more of the following aliases?"), but I think that would require a bit of reconfiguring of the work flow.

ngiangre commented 3 years ago

Let’s do the path of least resistance first!

On Wed, Jan 20, 2021 at 8:41 PM Andrew Clugston notifications@github.com wrote:

Another option is to have the package figure it out when the user provides a mask ("Does row 1 or column 1 have more of the following aliases?"), but I think that would require a bit of reconfiguring of the work flow.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/AndrewC160/ROMOPOmics/issues/25#issuecomment-764167783, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3TDS3Q25XL5TDMZS5TRDLS26A3FANCNFSM4VR47QCA .

-- Cheers,

Nick Giangreco Columbia University Systems Biology Ph.D. Student 716.713.2124