P3-Core-Dev-Team / P3-Q-A

This repo is for tracking features
Other
1 stars 0 forks source link

Ingestion of Content needs to be improvised #11

Closed harsha-kotha closed 11 months ago

harsha-kotha commented 1 year ago

Environment - Pulte New Dev ADS 3.1.7

https://archon-datastore.platform3solutions.com/pulte/dev/login

App : Recruiting Management Search : Test_Assign Table : CANDIDATE_BLOB

Structured Parquet Size : 9.67 MB Unstructured Size : 12.33 Both above sizes are post ingestion.

12.33 GB ingestion took 6 Hours 29 Minutes.

INGESTION_REPORT_07936521-b624-4559-b7b3-8befe66246b7_1690129087974.pdf

ClementJosh21 commented 11 months ago

Fix details @harsha-kotha We have given two fixes for this.

  1. The blob transfer to staging location was changed to move the files batch by batch
  2. The blob transfer to warehouse directory is parallelized.

Ingestion report before fix: Job_name_1695983869315.pdf

Ingestion report after fix: Job_name_1695983888069.pdf

Udhayanila8 commented 11 months ago

Blob Ingestion performance was improvised.

In 3.1.5 - 224368 record content count

Ingestion time - 1 hour 26 mins

Before Report -

Job_name_1695983869315 (3).pdf

In 3.1.7 - 224368 records content count

Ingestion time - 12 mins 7 sec

After Report -

Job_name_1695983888069 (2).pdf

harsha-kotha commented 11 months ago

Please share the size of Blob added in both runs along with number of blob ingested in each run. If it is 1 single blob, please test with more number of blobs and share ingestion report.

ClementJosh21 commented 11 months ago

@harsha-kotha We tested with more that one blob and also attached the screenshot for the size & count of blobs ingested.

Count: image

Size: image

If this is not satisfactory kindly helps us with the scenarios to test

  1. Blob count
  2. Each blob size
  3. No. of ingestion set
  4. No. of records

We will try to generate and ingest data that falls under this and submit the report.

harsha-kotha commented 11 months ago

Thanks, we will do more rounds from our side. I see scope for improvement in performance on Ingestion.