biocore / wetlab-assistant

BSD 3-Clause "New" or "Revised" License
0 stars 2 forks source link

added sample name validation #12

Closed qiyunzhu closed 7 years ago

qiyunzhu commented 7 years ago

Hello all @antgonza @josenavas @mortonjt @sjanssen2

This PR contains a new function, which checks if the sample names contained in the plate map are identical to a given sample name list. Currently, Gail sends the sample name list to the wet lab, and later manually validates the sample names in the mapping file she gets back from the wet lab. This process is quite time consuming. Therefore I implemented this new function after talking with Gail and Greg.

Specifically, this new function will pick out the sample names that 1) are present in the mapping file but not in the name list, 2) are present in the name list but not in the mapping file, 3) have duplicates in the mapping file.

When any of these instances are detected, the program will display a warning message to the user (wet lab). Note that it is a warning instead of an error, because in many cases the user intensionally makes these exceptions (e.g., adding a blank sample, excluding a certain sample). The user will then examine the warning message to see if everything is expected.

For the whole upgrade plan please refer to my email on Jan 5.

Meanwhile, I addressed @mortonjt 's suggestion: moving test data from the script to the external file system. You can find the files in the data/ directory.

Thanks and happy new year!

rob-knight commented 7 years ago

Great, this will be very helpful! Excited to see these wet lab tools moving.

On Jan 5, 2017, at 11:11 AM, Qiyun Zhu notifications@github.com<mailto:notifications@github.com> wrote:

Hello all @antgonzahttps://github.com/antgonza @josenavashttps://github.com/josenavas @mortonjthttps://github.com/mortonjt @sjanssen2https://github.com/sjanssen2

This PR contains a new function, which checks if the sample names contained in the plate map are identical to a given sample name list. Currently, Gail sends the sample name list to the wet lab, and later manually validates the sample names in the mapping file she gets back from the wet lab. This process is quite time consuming. Therefore I implemented this new function after talking with Gail and Greg.

Specifically, this new function will pick out the sample names that 1) are present in the mapping file but not in the name list, 2) are present in the name list but not in the mapping file, 3) have duplicates in the mapping file.

When any of these instances are detected, the program will display a warning message to the user (wet lab). Note that it is a warning instead of an error, because in many cases the user intensionally makes these exceptions (e.g., adding a blank sample, excluding a certain sample). The user will then examine the warning message to see if everything is expected.

For the whole upgrade plan please refer to my email on Jan 5.

Meanwhile, I addressed @mortonjthttps://github.com/mortonjt 's suggestion: moving test data from the script to the external file system. You can find the files in the data/ directory.

Thanks and happy new year!


You can view, comment on, or merge this pull request online at:

https://github.com/biocore/wetlab-assistant/pull/12

Commit Summary

File Changes

Patch Links:

- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/biocore/wetlab-assistant/pull/12, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AB69KcXPIW-E14RwMlYw9yeBmj8iar_gks5rPUBNgaJpZM4LcBbd.

mortonjt commented 7 years ago

Do you think it is worthwhile to include the case where a user has multiple spaces in addition to tabs? It is common to accidentally embed extra spaces in excel

qiyunzhu commented 7 years ago

@mortonjt That's a good question! I guess some column headers and (mistyped) sample names may contain spaces, which will complicate the situation. Do you suggest we should treat continuous spaces as a tab, or we should trim off leading and trailing spaces of each cell?

qiyunzhu commented 7 years ago

@antgonza Thank you very much for the very careful comments! I will address them one by one. I am completely fine with coveralls. How can we add this?

mortonjt commented 7 years ago

@qiyunzhu re splitting whitespace

You can split arbituary whitespace with split

In [1]: s = "s sss  \t   t"

In [2]: s.split()
Out[2]: ['s', 'sss', 't']

I know that pandas can have this issue when parsing tables.

But I would need a little more context to understand how the wetlab is using this information. Are they working with excel, or just plain text?

qiyunzhu commented 7 years ago

@mortonjt The way they work is that they copy & paste the Google Sheet / Excel contents into a text file. This will guarantee that cells are separated by tabs. The script will work on the text files. Meanwhile, some cells may contain spaces (this is normal, I hope not but I guess it happens). Therefore simple s.split() may not work properly.

mortonjt commented 7 years ago

Ok - in that case, it is probably good for now. If there are problems - we can patch them as appropriate.

qiyunzhu commented 7 years ago

@antgonza @mortonjt Now I figured out a better way: Use Python's warnings module instead of plain print. Warnings are easily testable. Meanwhile I put "validate" back to the main function, because now I can test warning instead of function return.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling e0b0617db993420ae289538c10a1bb107e146358 on qiyunzhu:validate into on biocore:master.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling a50aae0d905174ae8d8a0b88bf917f696b3f2072 on qiyunzhu:validate into on biocore:master.

mortonjt commented 7 years ago

Looks good to me. @antgonza want to do one last look before merging?

wasade commented 7 years ago

just two minor comments

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 475e5f5ec36651faf3986f896a0e8fe12bb82fec on qiyunzhu:validate into on biocore:master.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 475e5f5ec36651faf3986f896a0e8fe12bb82fec on qiyunzhu:validate into on biocore:master.

qiyunzhu commented 7 years ago

@mortonjt @wasade @antgonza anyone would like to merge? Thanks!

qiyunzhu commented 7 years ago

@antgonza @mortonjt @wasade Thank you very much!!