PathwayCommons / pathway-abstract-classifier

A tool to classify journal abstracts with pathway content.
MIT License
0 stars 1 forks source link

Running and reproducing results #11

Closed jvwong closed 2 years ago

jvwong commented 2 years ago

My experience trying to follow the README:

Running

I followed the Installation and copied the Quickstart code into a main.py in the top level directory. Then I ran python main.py and got a bunch of warnings:

site-packages/ktrain/text/preprocessor.py:216: UserWarning: List or array of two texts supplied, so task being treated as text classification. If this is a sentence pair classification task, please cast to tuple.
  warnings.warn('List or array of two texts supplied, so task being treated as text classification. ' +\

I think it worked but I'm not sure - What should I expect?

jvwong commented 2 years ago

Other notes:

Steven-Palayew commented 2 years ago

My experience trying to follow the README:

  • What version of python do I need?

    • I am using 3.8 but my conda environment defaulted to 2.7
  • What type of performance should I expect on some typical hardware?

    • If I run the sample code, how long is this going to take? What should I expect?
  • Typo (extra closing bracket) in the README

    • predictions = model.predict(texts))

  • Once you have requirements (there are only 3 - see requirements.txt) installed, you can simply run:

    • what does 'run' mean?
    • how do I know if it worked?

Running

I followed the Installation and copied the Quickstart code into a main.py in the top level directory. Then I ran python main.py and got a bunch of warnings:

site-packages/ktrain/text/preprocessor.py:216: UserWarning: List or array of two texts supplied, so task being treated as text classification. If this is a sentence pair classification task, please cast to tuple.
  warnings.warn('List or array of two texts supplied, so task being treated as text classification. ' +\

I think it worked but I'm not sure - What should I expect?

Hope this helps! 😄

jvwong commented 2 years ago

Maybe think about updating the README for some of these . Also, here's an example of documenting important functions:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/join

Steven-Palayew commented 2 years ago

I used the test_model.py code (which the Quickstart is based off of) and runtime on my laptop is just over 20 seconds. My laptop has an Intel i7-1165G7 and 16 GB of RAM.

JohnGiorgi commented 2 years ago

RE Python version: @Steven-Palayew Can we add this line right under Installation: "This repository requires Python 3.7 or later"

RE What does "run" mean: @jvwong This just means run the code in any python interpreter.

RE How do you know if its works: @jvwong This is what the assert statement is for. If it didn't work the assert statement will throw an error.

RE The warnings: These are unfortunately out of our control as they are being logged by our dependencies. We could add some boilerplate to remove them but IMO that would be even more confusing. Another option would be to add a note in the readme under the code snippet in quickstart, something like:

ktrain may throw a UserWarning which you can safely ignore.

RE The SEP token: @jvwong This is soft requirement of the pre-trained model we are using. It's used during training to denote the title from the abstract for the model, so we have to include it when making predictions. @Steven-Palayew I think this will be less confusing once #8 is addressed.

@jvwong Thanks for the feedback this will improve the readme. @Steven-Palayew I wonder if we should also provide a quickstart notebook and link it to colab? Then a user can actually see the whole process end-to-end.

Steven-Palayew commented 2 years ago

@JohnGiorgi I made a PR addressing #12 and by extension, the first few suggestions you brought up here. In terms of the last point, I added a tutorial which I linked in the ReadME which I believe addresses this. I still want to modify it based on your suggestion about the [SEP] tokens, and hopefully if @jvwong can package the code to go from UIDs-> titles+abstracts by end of week I can also include an example of how that pipeline (UIDs-> Classifications) would work.

JohnGiorgi commented 2 years ago

Cool, I would add a colab link to the readme: https://colab.research.google.com/github/PathwayCommons/pathway-abstract-classifier/blob/main/Tutorial.ipynb. That way someone can try it out quickly in the browser. This is also a good place to show the installation process.