UW-xDD / text2graph_llm

An experimental API endpoint to convert text to knowledge graph triplets.
MIT License
2 stars 1 forks source link

Preprocess Location to Mineral pipeline #51

Closed JasonLo closed 3 months ago

JasonLo commented 3 months ago

Run extraction pipeline on CHTC

JasonLo commented 3 months ago

Running Location to Mineral extraction pipeline on full dataset on CHTC, ETA: June 13.

JasonLo commented 3 months ago

NCCL error in some jobs. Will only work on A100-80GB GPUs? (Previous batch worked with the same setup, unsure why this issue occurred.) Resubmitted with this additional restriction.

iross commented 3 months ago

Hmm when was the previous batch run? And what is the error? We rolled out a small config change to how docker starts up ~2 weeks ago. I'd be shocked if that was the culprit but it's a bit suspicious...

JasonLo commented 3 months ago

Previous run was like 6 weeks ago? roughly

JasonLo commented 3 months ago

The error involves the NCCL driver. It seems more related to vllm than CHTC, or possibly their interaction. Using NCCL with a single GPU job doesn't make sense.

iross commented 3 months ago

Yeah some unexpected configuration or interaction in the CHTC setting is my main concern. It doesn’t really make much sense but these kinds of things have surprised me in the past.

On Thu, Jun 13 2024 at 11:37 AM, Jason Lo @.**@.>> wrote:

The error involves the NCCL driver. It seems more related to vllm than CHTC, or possibly their interaction. Using NCCL with a single GPU job doesn't make sense.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/UW-xDD/text2graph_llm/issues/51*issuecomment-2166183933__;Iw!!Mak6IKo!JVLyUT-mezNOkH5H4EL8dmn4FCM037r3bmDxT8pKK99pZceAwUn4CpEOloFejPiC7GW0ievNao7yqiY1g-dDcE8h$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AALAW7OOIKK3FS5EEQMNHQDZHHDFBAVCNFSM6AAAAABJEXVSBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGE4DGOJTGM__;!!Mak6IKo!JVLyUT-mezNOkH5H4EL8dmn4FCM037r3bmDxT8pKK99pZceAwUn4CpEOloFejPiC7GW0ievNao7yqiY1g1JK-013$. You are receiving this because you commented.Message ID: @.***>

JasonLo commented 3 months ago

Preprocess completed.