ben-aaron188 / textwash

GNU General Public License v3.0
22 stars 4 forks source link

Feature Request: Python and Pytorch Upgrade #12

Open MikeB2019x opened 1 year ago

MikeB2019x commented 1 year ago

I'm wondering if an upgrade of Python to at least 3.10 and Pytorch to 1.12 is possible? The latter especially would allow use of the Macbook Pro M1 GPU's which would be a huge improvement over the CPU.

I have tried just creating an environment with Py 3.10.11 and Pytorch 1.12 but it doesn't work.

This is the modified requirements file:

torch==1.12
numpy>=1.20.3
transformers>=2.6.0

This is the error same as issue #11 :

(textwash) xxxx@MAC-M1-16-FM73 textwash % python anon.py --input_dir examples --output_dir anonymised_examples --cpu
Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Load input data from 'examples'
Traceback (most recent call last):
  File "/Users/xxx/git/textwash/anon.py", line 61, in <module>
    data[filename[: filename.index(".txt")]] = f.read().strip()
  File "/Users/xxx/opt/miniconda3/envs/textwash/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte
ben-aaron188 commented 1 year ago

@maximilianmozes Is this an option. I like the idea of the Apple Silicon compatibility for M1, M2, ...

maximilianmozes commented 1 year ago

Thanks for the suggestion @MikeB2019x -- that is a good idea! We'll make sure to update this.