biocore / metagenomics_pooling_notebook

Jupyter notebooks to assist with sample processing
MIT License
8 stars 16 forks source link

Updated read_plate_map_csv to fail with repeated sample names #39

Closed cguccione closed 2 years ago

cguccione commented 2 years ago

Fixed #22 Updated read_plate_map_csv notebook to fail with repeated sample names. Removed repeated sample names check step in metagenomics_repooling.ipynb and caused user to not be able to continue with the notebook if there are repeated sample names. Added test case to ensure an error is thrown with repeat sample names in test_metapool.py.

cguccione commented 2 years ago

I'm not sure why I am failing some of these checks still, does anyone have an idea on how to resolve this?

ElDeveloper commented 2 years ago

@cguccione the failures are related to flake8, try running flake8 metapool from the base directory. If that doesn't yield any insights try chatting with @callaband or @ahdilmore they can probably help sort this out.

These are the failures that I'm seeing out of flake8:

metapool/metapool.py:49:1: W293 blank line contains whitespace
metapool/tests/test_metapool.py:137:1: W293 blank line contains whitespace
metapool/tests/test_metapool.py:139:13: F841 local variable 'obs_plate_df' is assigned to but never used
metapool/tests/test_metapool.py:140:1: W293 blank line contains whitespace

Can you also update the amplicon-pooling notebook and any other notebooks where there's a check for repeated sample names?

cguccione commented 2 years ago

@ElDeveloper I was able to fix the errors using flake8. I also made changes to amplicon-pooling, metagenomcis_pipeline_MiniPiconorm, metagenomics_pipeline_qPCRnorm, and metageneomics_samplesheet_generator to be identical to the original fix in read_plate_map_csv so they should all handle repeated sample names the same correct way now.

ElDeveloper commented 2 years ago

Thanks so much @cguccione! This is very helpful and I think it looks ready to merge, however it looks like there's a lot of changes that came in with the notebooks, do you know what might be going on there?

cguccione commented 2 years ago

Hi @ElDeveloper ! I've looked through my pull request a few times and I can't find any changes other than the ones I intended to add along with changes to the output of some of the cells in the jupyter notebooks (that seems to be where most of the changes are coming from). I'm also not sure why it appears that amplicon-pooling.ipynb notebook isn't appearing for you, because I can see it in my commit. I'm happy to set up a meeting with @ahdilmore and @kfogelso and we can figure out how to resolve this issue if that helps! Thanks for all your help.

ElDeveloper commented 2 years ago

Hi @cguccione, looks like I made a mistake. You didn't delete amplicon-pooling.ipynb. Thanks for catching that. I think the only thing that should be updated is the fact that the notebooks currently have error messages in the committed version. For example compare cell 10 in this notebook from your fork with cell 12 in this notebook. As you can tell one of them as an exception's traceback and the other one has the "usual output". I think you should be able to rerun the notebook and as long as there are no errors all should be good to go. Would you mind making sure that the same is true for other notebooks (at least metagenomics_samplesheet_generator.ipynb shows a similar problem). Please let me know if I can help.

cguccione commented 2 years ago

Hi @ElDeveloper! I fixed some of the errors, but I'm having issues running the notebooks without any errors. I tried rebasing my fork in an attempt to resolve any errors, but that did not fix all the issues. These are the main two I am running into which are causing problems downstream in the notebooks and causing there to be major differences when pushing to git.

  1. In the 'metagenomics_pipeline_MiniPiconorm' notebook I get the following errors in the Demux Summary step : FileNotFoundError: [Errno 2] No such file or directory: './test_data/Demux/Stats.json'

    • I'm not sure where to locate this file, it doesn't appear to be anywhere on Github, but when others ran the notebook (I was looking last at Hazel's commit) they had no issue running this cell.
    • This is causing a lot of errors downstream in the notebook that isn't fixable without this file.
  2. In the 'metagenomics_pipeline_qPCRnorm' notebook I get the following error in the 'Step 2: format sample sheet data' section. KeyError: "['Project Plate', 'Project Name'] not in index"

    • I'm not sure how to fix this either and is once again causing downstream errors.

I am happy to meet with @ahdilmore or someone who has recently submitted a successful pull request in order to see how they accounted for these errors or what I may have changed to cause them. Thank you for all your help and I apologize this process is taking so long.

ahdilmore commented 2 years ago

@cguccione @ElDeveloper I'm not able to replicate the first error, but I did replicate the second. I'm wondering if others are able to replicate such as @charles-cowart @antgonza

cguccione commented 2 years ago

Thank you so much @ahdilmore ! I am also noticing that some of my packages are different versions than yours so I can try updating mine and rerunning the 'metagenomics_pipeline_MiniPiconorm' notebook to see if I can resolve the issue.

charles-cowart commented 2 years ago

Just in general I’ve noticed a few things like unittests will break if I create a brand-new Conda environment using the instructions in the README. Python 3.10 came out recently, and if the new environment uses Python 3.10, some paths have been deprecated or moved, some package versions may change, etc.

On Dec 7, 2021, at 10:56 AM, Caitlin Guccione @.***> wrote:

@cguccione https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cguccione&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=r9g1PxFQBBL7eQ2L8OAF0l29pckeyaar0r5s1gYYuws&m=VfsQlDRPs7Ges7PeLaGBM9jzBp0evljE4IGFUA-Noe6zBve7DxembKQCEroZBTx0&s=9ExwW6TpRtzPjKaGwSVQ21zDgq4OvfpSr9thY0olP_E&e= @ElDeveloper https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ElDeveloper&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=r9g1PxFQBBL7eQ2L8OAF0l29pckeyaar0r5s1gYYuws&m=VfsQlDRPs7Ges7PeLaGBM9jzBp0evljE4IGFUA-Noe6zBve7DxembKQCEroZBTx0&s=jLZ0xO-iXaYcI2A7aP2Zq5MHNG_JP_-SqeCy-qZLKWI&e= I'm not able to replicate the first error, but I did replicate the second. I'm wondering if others are able to replicate such as @charles-cowart https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_charles-2Dcowart&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=r9g1PxFQBBL7eQ2L8OAF0l29pckeyaar0r5s1gYYuws&m=VfsQlDRPs7Ges7PeLaGBM9jzBp0evljE4IGFUA-Noe6zBve7DxembKQCEroZBTx0&s=TiviN3QYjXU2P0XjrDnR-A3-oKu0Y6qgeoYF9MAlHJI&e= @antgonza https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_antgonza&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=r9g1PxFQBBL7eQ2L8OAF0l29pckeyaar0r5s1gYYuws&m=VfsQlDRPs7Ges7PeLaGBM9jzBp0evljE4IGFUA-Noe6zBve7DxembKQCEroZBTx0&s=42OkYboPzsZ3a0MxXWi1lpf0TQt59aTPj9NwGtpOP98&e= Thank you so much @ahdilmore https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ahdilmore&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=r9g1PxFQBBL7eQ2L8OAF0l29pckeyaar0r5s1gYYuws&m=VfsQlDRPs7Ges7PeLaGBM9jzBp0evljE4IGFUA-Noe6zBve7DxembKQCEroZBTx0&s=EyFhWdkDX7-gIt6BhsogwF82VY2-E5zhih_2h_8DZJM&e= ! I am also noticing that some of my packages are different versions than yours so I can try updating mine and rerunning the 'metagenomics_pipeline_MiniPiconorm' notebook to see if I can resolve the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_biocore_metagenomics-5Fpooling-5Fnotebook_pull_39-23issuecomment-2D988186722&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=r9g1PxFQBBL7eQ2L8OAF0l29pckeyaar0r5s1gYYuws&m=VfsQlDRPs7Ges7PeLaGBM9jzBp0evljE4IGFUA-Noe6zBve7DxembKQCEroZBTx0&s=mp0T_gMJCG0wDoq1lvfAYbGVR30DUuioFTObY9fIS_Y&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AKFU7EZYDUYZENM3LF2HOKTUPZKEFANCNFSM5E6LQSDQ&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=r9g1PxFQBBL7eQ2L8OAF0l29pckeyaar0r5s1gYYuws&m=VfsQlDRPs7Ges7PeLaGBM9jzBp0evljE4IGFUA-Noe6zBve7DxembKQCEroZBTx0&s=2cu_BY7EUrTQCnI4naV7KaL7qQipOytdtCdFmW4_H-A&e=. Triage notifications on the go with GitHub Mobile for iOS https://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=r9g1PxFQBBL7eQ2L8OAF0l29pckeyaar0r5s1gYYuws&m=VfsQlDRPs7Ges7PeLaGBM9jzBp0evljE4IGFUA-Noe6zBve7DxembKQCEroZBTx0&s=EJpbQKtd1fUMdNHm4OUTIiL8rd4Fi9RxbM0X85DPbNg&e= or Android https://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMFaQ&c=-35OiAkTchMrZOngvJPOeA&r=r9g1PxFQBBL7eQ2L8OAF0l29pckeyaar0r5s1gYYuws&m=VfsQlDRPs7Ges7PeLaGBM9jzBp0evljE4IGFUA-Noe6zBve7DxembKQCEroZBTx0&s=vlJ06lAE7ifDhRd2Quxar3Iw4yeFEDvFreBV4sDg4GE&e=.

RodolfoSalido commented 2 years ago

Files are zipped cause GitHub.

Stats.json.zip

Here's the stats.json file that you are looking for. I suspect not everyone has been running all of the modules in the notebook because they are not modifying the later portion of the notebook.

The second error arises because the input plate_map.tsv file currently in the test_data do not have the most up to date file structure. Old plate maps are missing the columns 'Project Plate' and 'Project Name'. Replacing that file with this one should fix the issue and clean up the repository.

Finrisk 33-36_plate_map.zip

cguccione commented 2 years ago

The Stats.json.zip file that @RodolfoSalido provided solved the issue. I agree that I believe it had gotten dropped before because I can't see it on the biocore:master fork of the Github. I added it back in with my 'Incorporate updated files to resolve errors in notebooks' commit above.

The second issue was easily solved by replacing Finrisk 33-36_plate_map.zip with your updated file. I also added the updated file in the 'Incorporate updated files to resolve errors in notebooks' commit above.

I am still getting a LinAlgError: singular matrix error in both the 'metagenomics_pipeline_MiniPiconorm' and the 'metagenomics_pipeline_qPCRnorm' notebooks. This isn't a major issue because it still allows me to move through the notebook and outputs a graph, although it does through a long error message. I tried deleting and reinstalling my conda environment to update all my packages in pooling_nb and I still seem to be running into the problem. Does anyone have solutions or should I just push with this minor error?

Thank you so much for your help! @charles-cowart @ahdilmore @RodolfoSalido !

RodolfoSalido commented 2 years ago

What you describe makes sense. I think the test_data/Demux/Stats.json directory got mistakenly dropped from the repository at one point. It should be there.

ElDeveloper commented 2 years ago

I agree with Rodolfo including the file in the repo (as you did) should be the way to go. This way we'll have the ability to fully run the notebooks without any hiccups. Can you send me a note when you add these changes then we can do another pass on the code and merge 👍

On Dec 7, 2021, at 3:13 40PM, RodolfoSalido @.***> wrote:

What you describe makes sense. I think the test_data/Demux/Stats.json directory got mistakenly dropped from the repository at one point. It should be there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ElDeveloper commented 2 years ago

@cguccione can you work with @charles-cowart to fix the merge conflcits for this PR?

cguccione commented 2 years ago

Hi @ElDeveloper @charles-cowart I looked through the merge conflicts it is just the new output from the jupyter cells that resulted when Charles made some updates, and the minor changes I made to the code in my initial commit which deal with the code failing with repeated sample names. I don't think there are any areas where @charles-cowart and I both changed code differently. @kfogelso and I were planning on meeting with @antgonza later today so I am happy to walk through all the changes in that meeting or if @charles-cowart wants to meet I can do that as well, but I think we should be able to proceed without meeting and I can go through and resolve conflicts.

In regards to the changes that you asked me to make ''Can you send me a note when you add these changes then we can do another pass on the code and merge 👍'', I'm not sure what else you wanted me to change? I added in the Stats.json.zip and the Finrisk 33-36_plate_map.zip file in the previous commit but I'm happy to make more updates if needed, please let me know what you meant. I apologize for not replying to this comment last week prior to the other pull request, I was traveling.

Thank you all for the help!