Closed sociengineer closed 1 year ago
When I sent a message, there's no response.
@JeremyJonas Re-executing the Step Functions resolved the processing job, but failed to run VectorStoreIndexTasK.
The cause could not be determined because Lambda did not return an error type. Returned payload: {"errorMessage":"2023-10-23T17:23:38.340Z ed6714b3-dba5-4f6a-b3a1-1ab0d150e4a0 Task timed out after 900.02 seconds"}
Fixed by #66
@sociengineer - Changed indexing processing job from Lambda (15 min max) to ECS task (no timeout) to ensure long running RDS indexing task completes as part of the pipeline step function.
Describe the bug
SageMaker Processing Job fails to embedding sample dataset into the vector store while running in Pipeline Statemachine.
The log in the Processing jobs
INSERT INTO all_mpnet_base_v2_768 (id, source_location, document, cmetadata, embeddings) VALUES \n('42af8ad7-db34-40ff-9241-e4241358ec54','s3://dev-galileo-corpusnested-processeddatabucket4e25d-30b26d922em9/cases/30/case1236.txt','According to the State and NMWA, NEPA requires BLM to complete a supplemental EIS specifically analyzing the likely environmental effects of Alternative A-modified before adopting that alternative as the new management plan for the area, and its failure to do so was arbitrary and capricious. An agency must prepare a supplemental assessment if “[t]he agency makes substantial changes in the proposed action that are relevant to environmental concerns.” 24 40 C.F.R. § 1502.9(c)(1)(i) (emphases added). When “the relevant environmental impacts have already been considered” earlier in the NEPA process, no supplement is required. Friends of Marolt Park v. U.S. Dep''t of Transp., 382 F.3d 1088, 1096-97 (10th Cir.2004). In a guide to NEPA published in the Federal Register, the CEQ states that a supplement is unnecessary when the new alternative is “qualitatively within the spectrum of alternatives that were discussed in the draft” and is only a “minor variation” from those alternatives.','{\"example\":\"True\",\"category_id\":\"30\",\"domain\":\"Legal\",\"original_source_url\":\"https://osf.io/qvg8s/files/osfstorage\",\"category\":\"Environmental Law\",\"asset_key_prefix\":\"/cases/30/\",\"collection\":\"casefiles\",\"original_location\":\"https://osf.io/8mjcy#preprocessed_cases[cases_29404]/30\",\"original_source\":\"OSF: SigmaLaw - Large Legal Text Corpus and Word Embeddings\",\"source_location\":\"s3://dev-galileo-corpusnested-processeddatabucket4e25d-30b26d922em9/cases/30/case1236.txt\",\"section_index\":71}',array[0.005580055993050337,0.04951924830675125,0.007071380503475666,-0.03756583854556084,-0.017889730632305145,0.009061774238944054,-0.051643624901771545,0.0003611240827012807,-0.09720514714717865,-0.04086209461092949,0.037306345999240875,0.06428683549165726,0.01733892597... (2000 of 17567251)"
Expected Behavior
Successfully embedded sample dataset in the vector store.
Current Behavior
Reproduction Steps
deploy sample dataset stack
Possible Solution
No response
Additional Information/Context
No response
Environment details (OS name and version, etc.)
No response