Anush008 / fastembed-js

Library to generate vector embeddings in NodeJS
https://www.npmjs.com/package/fastembed/
MIT License
57 stars 3 forks source link

fast-multilingual-e5-large.tar.gz Access denied #18

Open sorokinvj opened 1 month ago

sorokinvj commented 1 month ago

Hello, I want to use fast-multilingual-e5-large, but when the lib is trying to download it I get:

An error occurred: Error: TAR_BAD_ARCHIVE: Unrecognized archive format
    at Unpack.warn (/Users/vladislavsorokin/Projects/tax-chatbot/node_modules/tar/lib/warn-mixin.js:21:40)
    at Unpack.warn (/Users/vladislavsorokin/Projects/tax-chatbot/node_modules/tar/lib/unpack.js:236:18)
    at Unpack.<anonymous> (/Users/vladislavsorokin/Projects/tax-chatbot/node_modules/tar/lib/parse.js:83:14)
    at Unpack.emit (node:events:526:35)
    at [emit] (/Users/vladislavsorokin/Projects/tax-chatbot/node_modules/tar/lib/parse.js:313:12)
    at [maybeEnd] (/Users/vladislavsorokin/Projects/tax-chatbot/node_modules/tar/lib/parse.js:468:17)
    at [consumeChunk] (/Users/vladislavsorokin/Projects/tax-chatbot/node_modules/tar/lib/parse.js:500:21)
    at Unpack.write (/Users/vladislavsorokin/Projects/tax-chatbot/node_modules/tar/lib/parse.js:427:25)
    at Unpack.end (/Users/vladislavsorokin/Projects/tax-chatbot/node_modules/tar/lib/parse.js:548:14)
    at Pipe.end (/Users/vladislavsorokin/Projects/tax-chatbot/node_modules/minipass/index.js:75:17) {
  recoverable: false,
  file: 'local_cache/fast-multilingual-e5-large.tar.gz',
  code: 'TAR_BAD_ARCHIVE',
  tarCode: 'TAR_BAD_ARCHIVE'
}

then when I click on fast-multilingual-e5-large.tar.gz I see the file with content:

<?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).</Details></Error>
sorokinvj commented 1 month ago

I downloaded the model from Hugging Face but I am missing model_optimized.onnx

Anush008 commented 1 month ago

I think this model has a bad Google Cloud Storage source.

We definitely would need to move to HF. Like FastEmbed-py and FastEmbed-rs.