ByrumLab / proteoDA

GNU General Public License v3.0
12 stars 11 forks source link

allow for multiple filtering in subset_targets #25

Closed tjthurman closed 2 years ago

tjthurman commented 2 years ago

I think both @jbird9 and @dhalkam have run into this issue at different points. Currently, subset_targets() only filters on one column at a time (though you can filter out multiple strings for that column). So, filtering on multiple columns (e.g., filter out "Pool" samples from the group column, then filtering a specific sample from the the sample column) requires running subset_targets() twice. This is also a little un-intuitive. In the pipeline, the main input to subset_targets() is the targets dataframe made by make_targets(), but the output of subset_targets() is a list, where the first element is the targets data frame. So, to filter twice, the current code is kinda awkward:

targets <- make_targets(...)
filter1 <- subset_targets(targets, filter_column = "group", rm.vals = "Pool")
final_targets <- subset_targets(filter1$targets, filter_column = "sample", rm.vals = c("sampleA", "sampleB")

Might be nice to change this so that you can do all the filtering at once. Maybe by filtering on a named list, where the name is the column and the list element is the stuff to get filtered, e.g.:

targets <- make_targets(...)
final_targets <- subset_targets(targets, 
                                filter = list("group" = c("Pool"),
                                              "sample" = c("sampleA", "sampleB")))
clw09 commented 2 years ago

sounds good

Cheers!

Charity L. Washam, PhD

Instructor of Biochemistry and Molecular Biology

Bioinformatician, UAMS Bioinformatics Core

University of Arkansas for Medical Sciences

4301 W Markham St., Slot 516

Little Rock, Arkansas 72205

Bioinformatician, CTPR Genomics and Bioinformatics Resource

Center for Translational Pediatric Research (CTPR)

(www.archildrens.org/archildrens-COBRE)

Arkansas Children's Research Institute (ACRI)

13 Children’s Way, Slot 512-47

Little Rock, Arkansas 72202-3591


This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom it is addressed. This communication may contain material protected by attorney-client privilege. If you are not the intended recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error and that any use dissemination, forwarding, printing, or copying of this email and any file attachments is strictly prohibited. If you have received this email in error, please notify me immediately by reply email. You must destroy the original transmission and its contents.


From: Tim Thurman @.> Sent: Wednesday, May 11, 2022 10:07:29 AM To: ByrumLab/proteomicsDIA @.> Cc: Subscribed @.***> Subject: [ByrumLab/proteomicsDIA] allow for multiple filtering in subset_targets (Issue #25)

I think both @jbird9https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jbird9&d=DwMFaQ&c=27AKQ-AFTMvLXtgZ7shZqsfSXu-Fwzpqk4BoASshREk&r=E045ukXXqOEQLWSfZLobKA&m=Z2yee8UgQDB2wvyRaak-cX38FsJP0PhBwpRR8druzr8&s=AX1GvnO3-Vz-iTvgwbNjm6mfFbX9dE8KkZepH7y8KLI&e= and @dhalkamhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dhalkam&d=DwMFaQ&c=27AKQ-AFTMvLXtgZ7shZqsfSXu-Fwzpqk4BoASshREk&r=E045ukXXqOEQLWSfZLobKA&m=Z2yee8UgQDB2wvyRaak-cX38FsJP0PhBwpRR8druzr8&s=P5nd2ZtYeKFKqv4clnaT2VVjRgvMHirUkxlOWqRv6-I&e= have run into this issue at different points. Currently, subset_targets() only filters on one column at a time (though you can filter out multiple strings for that column). So, filtering on multiple columns (e.g., filter out "Pool" samples from the group column, then filtering a specific sample from the the sample column) requires running subset_targets() twice. This is also a little un-intuitive. In the pipeline, the main input to subset_targets() is the targets dataframe made by make_targets(), but the output of subset_targets() is a list, where the first element is the targets data frame. So, to filter twice, the current code is kinda awkward:

targets <- make_targets(...) filter1 <- subset_targets(targets, filter_column = "group", rm.vals = "Pool") final_targets <- subset_targets(filter1$targets, filter_column = "sample", rm.vals = c("sampleA", "sampleB")

Might be nice to change this so that you can do all the filtering at once. Maybe by filtering on a named list, where the name is the column and the list element is the stuff to get filtered, e.g.:

targets <- make_targets(...) final_targets <- subset_targets(targets, filter = list("group" = c("Pool"), "sample" = c("sampleA", "sampleB")))

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ByrumLab_proteomicsDIA_issues_25&d=DwMFaQ&c=27AKQ-AFTMvLXtgZ7shZqsfSXu-Fwzpqk4BoASshREk&r=E045ukXXqOEQLWSfZLobKA&m=Z2yee8UgQDB2wvyRaak-cX38FsJP0PhBwpRR8druzr8&s=P1vBGG7a7mqiZwgsbYByH-8FrEXRSaAOZhL9XKN7fcc&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AK4NJ7DLOSXLHWL2L7S25F3VJPELDANCNFSM5VVIYPQQ&d=DwMFaQ&c=27AKQ-AFTMvLXtgZ7shZqsfSXu-Fwzpqk4BoASshREk&r=E045ukXXqOEQLWSfZLobKA&m=Z2yee8UgQDB2wvyRaak-cX38FsJP0PhBwpRR8druzr8&s=7Gapu4vSP0ujoUAhu_0OP0owiU8z3vaVLecT0ZWO_fo&e=. You are receiving this because you are subscribed to this thread.Message ID: @.***>


Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.