Open mkorvas opened 11 months ago
Thank you for letting us know! It's much appreciated!
Although v1.8.0 would probably be not be installable in a clean environment due to the py2neo
issue (see https://github.com/CogStack/MedCAT/pull/356).
With that said, I'll try and bump the versions in the tutorial to the latest medcat
version (1.9.3).
That fixed the py2neo
issue. Plus, it doesn't pose the same limitation transformers
versions. In medcat~=1.8
transformers>=4.19.2,<4.22.0
is specified, which in turn pins tokenizers>=0.11.1,!=0.11.3,<0.13
.
I've created a PR for the above.
Given such a prompt and kind response, let me mention a few more hiccups I encountered when exploring the tutorial. Some of them may be an issue on my end, I haven't checked very thoroughly and I ran the commands locally in an IPython shell or the system shell (for the system commands included in the Jupyter notebooks). Here is the list, anyway:
seaborn
manually -- I only saw a command to install medcat==1.8.0
in the tutorial and that did not already pull in seaborn
.PyQt5
to make the plt.show()
calls do something visible. (I guess this is likely caused by me not running the tutorial in the context of Jupyter.)The cat.cdb.print_stats()
calls in sections 3.2 and 3.3 of the tutorial didn't have any visible effect when I ran them, either. However, another similar method that I found in the MedCAT docs, make_stats()
, did print something informative:
In [72]: %cpaste -q
# Now print statistics on the CDB after training
cat.cdb.print_stats()
--
In [73]: cat.cdb.make_stats()
Out[73]:
{'Number of concepts': 34724,
'Number of names': 92740,
'Number of concepts that received training': 34724,
'Number of seen training examples in total': 4098991,
'Average training examples per concept': 118.04489690127865}
In [74]: cat.cdb.print_stats()
cat.train(...)
method apparently worked (as the later cat.get_entities
call identified the entity in the test input sentence) but running results = cat.multiprocessing(in_data, nproc=2)
yielded empty results
(I think it was an empty tuple). Maybe I need a (stronger) GPU card for that to work? I just noticed an update of the MedCAT readme providing an alternative command for installing MedCAT in a CPU-only setup...Thank you for the further feedback. We don't get much feedback for the tutorials so this is much appreciated!
With that said, our tutorials are (at least for the time being) targeting Jupyter Notebooks and/or Google Colab. Feature parity in other environments is not guaranteed.
seaborn
out of the box), I've created a PR for this.CDB.print_stats
method being used in Part 3.3.Thanks for the quick and extensive reply again!
Indeed, I am not finding any occurrences of "print_stats" in Part 3.3 of the tutorial, I didn't even download a copy of that one... However, FWIW, I am noticing it's titled "Part 3.2 - Extracting Diseases from Electronic Health Records.ipynb" although the URL has "Part_3_3_Model_technical_optimisations.ipynb" in it... probably a copy-paste error?
Indeed, I am not finding any occurrences of "print_stats" in Part 3.3 of the tutorial, I didn't even download a copy of that one... However, FWIW, I am noticing it's titled "Part 3.2 - Extracting Diseases from Electronic Health Records.ipynb" although the URL has "Part_3_3_Model_technical_optimisations.ipynb" in it... probably a copy-paste error?
I see what you mean now. Though I'm not entirely sure where it grabs the title or how to change it.
Following commands from the MedCAT tutorial on my recently updated Arch Linux, I started by pip-installing
medcat==1.8.0
:and it failed while installing the transitive dependency of
tokenizers-0.12.1
:This was with Rust-1.73.0 installed on the system. After downgrading to Rust-1.72.1, the build worked. This post in the discussion of the python-tokenizers package in AUR suggests that requiring
tokenizers==0.14.1
instead should make this work (with at least Rust-1.70.0 or newer).I am posting this issue here because it effectively causes the instructions of the tutorial to be broken, even though it's probably not an issue that could be easily fixed in the tutorial itself.