JohnSnowLabs / spark-nlp

State of the Art Natural Language Processing
https://sparknlp.org/
Apache License 2.0
3.83k stars 710 forks source link

Cache mechanism implementation for metadata.json #14224

Closed mehmetbutgul closed 6 months ago

mehmetbutgul commented 6 months ago

issue --> https://github.com/JohnSnowLabs/spark-nlp/issues/14221 Cache mechanism implementation for metadata.json. Nowadays, metadata.json is more than 10MB and it will increase in the future.

Description

The main logic is here: 1 - gets3Object that includes getLastModified() (just contains a summary, do not download the whole metadata.json file.) 2- check the condition (cache contains up-to-date metadata) 3- If the cache contains up-to-date metadata, get it; Otherwise, download it, set it to the cache, and return it.

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

maziyarpanahi commented 6 months ago

I'll let @danilojsl review this as he made the last changes.