deepset-ai / deepset-cloud-sdk

A Python SDK to interact with deepset Cloud
Apache License 2.0
9 stars 1 forks source link

Ingestion stuck at 99% #169

Closed ArzelaAscoIi closed 5 months ago

ArzelaAscoIi commented 5 months ago

hotel_reviews (3).zip

Ingestion stuck at 99%. Most likely its due to a broken discovery locally discovering more than to be ingested and waiting for no reason.

python3 -m deepset_cloud_sdk.cli upload ./hotel_reviews --recursive --write-mode OVERWRITE
wochinge commented 5 months ago

The dataset contains duplicate files

❯ find hotel_reviews ! -name '*.meta.json' ! -name '.DS_Store' -type f | sed 's_.*/__' | sort|  uniq -d
b_montmartre_0.txt
b_montmartre_1.txt
b_montmartre_10.txt
b_montmartre_11.txt
b_montmartre_12.txt
b_montmartre_13.txt
b_montmartre_14.txt
b_montmartre_15.txt
b_montmartre_16.txt
b_montmartre_17.txt
b_montmartre_18.txt
b_montmartre_19.txt
b_montmartre_2.txt
b_montmartre_20.txt
b_montmartre_21.txt
b_montmartre_22.txt
b_montmartre_23.txt
b_montmartre_24.txt
b_montmartre_25.txt
b_montmartre_26.txt
b_montmartre_27.txt
b_montmartre_28.txt
b_montmartre_29.txt
b_montmartre_3.txt
b_montmartre_4.txt
b_montmartre_5.txt
b_montmartre_6.txt
b_montmartre_7.txt
b_montmartre_8.txt
b_montmartre_9.txt
wochinge commented 5 months ago

doesn't work for every upload mode (also if not using override)

Options:

wochinge commented 5 months ago
2024-04-05 10:37:13 [warning  ] Skipping file                  file_path=PosixPath('hotel_reviews/hotel_reviews_subset/mercure_paris_champs_elys_es')
2024-04-05 10:37:13 [warning  ] Skipping file                  file_path=PosixPath('hotel_reviews/hotel_reviews_subset/thistle_holborn_the_kingsley')
2024-04-05 10:37:13 [warning  ] Skipping file                  file_path=PosixPath('hotel_reviews/hotel_reviews_subset/tryp_barcelona_apolo_hotel')
2024-04-05 10:37:13 [warning  ] Skipping file                  file_path=PosixPath('hotel_reviews/hotel_reviews_subset/nh_hesperia_barcelona_del_mar')
2024-04-05 10:37:13 [warning  ] Skipping file                  file_path=PosixPath('hotel_reviews/hotel_reviews_subset/shaftesbury_suites_london_marble_arch')
2024-04-05 10:37:13 [warning  ] Skipping file                  file_path=PosixPath('hotel_reviews/hotel_reviews_subset/residence_du_roy')
2024-04-05 10:37:13 [warning  ] Skipping file                  file_path=PosixPath('hotel_reviews/hotel_reviews_subset/majestic_hotel_spa_barcelona_gl')
2024-04-05 10:37:13 [warning  ] Skipping file                  file_path=PosixPath('hotel_reviews/hotel_reviews_subset/camperio_house_suites_apartments')
2024-04-05 10:37:13 [warning  ] Skipping file                  file_path=PosixPath('hotel_reviews/hotel_reviews_subset/grupotel_gran_via_678')