centre-for-humanities-computing / conspiracies

A python package for discovering and examining conspiracies using NLP.
https://centre-for-humanities-computing.github.io/conspiracies/
MIT License
8 stars 0 forks source link

Failed to install on macOS M1 #85

Open KatHellm opened 2 days ago

KatHellm commented 2 days ago

Description of issue

Package installation fails when building the wheel for tokenizers.

System: macOS M1

Tried: installing and updating Rust and rustup but the error persists. Also attempted on Python version 3.9 and 3.12 in new env.

Error details

      warning: variable does not need to be mutable
         --> tokenizers-lib/src/models/unigram/model.rs:265:21
          |
      265 |                 let mut target_node = &mut best_path_ends_at[key_pos];
          |                     ----^^^^^^^^^^^
          |                     |
          |                     help: remove this `mut`
          |
          = note: `#[warn(unused_mut)]` on by default

      warning: variable does not need to be mutable
         --> tokenizers-lib/src/models/unigram/model.rs:282:21
          |
      282 |                 let mut target_node = &mut best_path_ends_at[starts_at + mblen];
          |                     ----^^^^^^^^^^^
          |                     |
          |                     help: remove this `mut`

      warning: variable does not need to be mutable
         --> tokenizers-lib/src/pre_tokenizers/byte_level.rs:200:59
          |
      200 |     encoding.process_tokens_with_offsets_mut(|(i, (token, mut offsets))| {
          |                                                           ----^^^^^^^
          |                                                           |
          |                                                           help: remove this `mut`

      error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
         --> tokenizers-lib/src/models/bpe/trainer.rs:526:47
          |
      522 |                     let w = &words[*i] as *const _ as *mut _;
          |                             -------------------------------- casting happend here
      ...
      526 |                         let word: &mut Word = &mut (*w);
          |                                               ^^^^^^^^^
          |
          = note: for more information, visit <https://doc.rust-lang.org/book/ch15-05-interior-mutability.html>
          = note: `#[deny(invalid_reference_casting)]` on by default

      warning: `tokenizers` (lib) generated 3 warnings
      error: could not compile `tokenizers` (lib) due to 1 previous error; 3 warnings emitted
KasperFyhn commented 1 day ago

The issue does not seem to be for conspiracies specifically, but related to old dependencies in the project which ultimately results in a rather old version of tokenizers. It is long overdue to get dependencies updated, but there is quite a bit of technical debt associated with some of the dependencies, e.g. getting some of the models to run with newer versions of SpaCy.

I will take a look if we can do something about getting tokenizers to a more recent version and get back.

A related issue can be found here.