bricksdont / sign-sockeye-baselines

Baseline systems for the WMT shared task on sign language translation
https://www.wmt-slt.com/
MIT License
6 stars 2 forks source link

Readme instructions: FocusNews download: Zenodo email: did not send me an access token in my email #3

Open cleong110 opened 2 months ago

cleong110 commented 2 months ago

Expected: "You will the be sent an email with a private link. The last portion of the private link is an access token."

Result: I received the link "https://zenodo.org/records/6631159", which is the public link with no access token.

cleong110 commented 2 months ago

I plan to follow the Zenodo developer instructions to create a personal access token next. (https://developers.zenodo.org/#authentication)

If that works I will submit a pull request to update the documentation.

Edit: link here actually https://developers.zenodo.org/#creating-a-personal-access-token

cleong110 commented 2 months ago

Creating an access token: image It kind of seems that there's none for downloads?

Further Googling shows this thread with download instructions: https://github.com/zenodo/zenodo/issues/1888

None of which seems to clear it up. I will proceed by simply making an "all scopes" token

cleong110 commented 2 months ago

Running the scripts: ran into absolute path issues, "no such directory" for "/net/cephfs/shares/volk.cl.uzh/mathmu/sign-sockeye-baselines". I will add a fix for that to the pull request, I suppose.

Edit: that was actually addressed in the README. Nevertheless made a version that goes off the script directory. Also, had to edit the script from run_generic.sh to run_generic_no_slurm.sh as noted in the readme.

Now I'm stuck on EnvironmentLocationNotFound: Not a conda environment: /home/vlab/projects/sign_language_processing/replication/sign-sockeye-baselines/venvs/sockeye3

Ah, that would be this: image

I guess this implies I do need to run the "basic setup" part first. Lemme go look at that.

OK now I see that that script has "module load volta anaconda3". I don't have "module" as a command. Do I have to install it a la https://installati.one/install-environment-modules-ubuntu-20-04/ just to run conda in the expected way? I wonder what volta is. Maybe this? https://docs.volta.sh/guide/

Could also be https://pypi.org/project/volta/

Manually creating a conda env seemed to work successfuly. I got some scripts to run at least, it seems to be stuck on downloading the Zenodo set.

It will make the directories but won't seem to download the files... image

Investigating further, the API response.json "files" field is empty image

OK, even going step by step manually with curl, I cannot get the download to work.

Time to just manually download the .zip file, and make the directories myself.

cleong110 commented 2 months ago

All right, I got partway through the "run_generic_no_slurm" process, but it seems to be failing on the preprocessing?

image

Inspecting the logs, "permission denied" on line 144 of the preprocessing script.

Edit: I have tried and tried and tried to get more information, but other than that one error I just cannot get any output.

/home/vlab/projects/sign_language_processing/replication/sign-sockeye-baselines/scripts/running/run_generic_no_slurm.sh: line 144: /home/vlab/projects/sign_language_processing/replication/sign-sockeye-baselines/scripts/preprocessing/preprocess_generic.sh: Permission denied

I can't even get it to write to a "foo.txt" on the first line of the script!

OH! The Preprocessing scripts are not executable! Right!!

image

That's easily fixed image

cleong110 commented 2 months ago

Now it crashes on "prepare"!

Wait no, it got through that and crashed on "train".

2024-04-29 18:06:19.928746: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-29 18:06:20.018302: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-04-29 18:06:20.018325: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-04-29 18:06:20.496638: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2024-04-29 18:06:20.496704: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2024-04-29 18:06:20.496712: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
usage: train.py [-h] [--config CONFIG] [--source SOURCE]
                [--source-is-continuous]
                [--source-continuous-num-features SOURCE_CONTINUOUS_NUM_FEATURES]
                [--target-is-continuous]
                [--target-continuous-num-features TARGET_CONTINUOUS_NUM_FEATURES]
                [--source-factors SOURCE_FACTORS [SOURCE_FACTORS ...]]
                [--source-factors-use-source-vocab SOURCE_FACTORS_USE_SOURCE_VOCAB [SOURCE_FACTORS_USE_SOURCE_VOCAB ...]]
                [--target-factors TARGET_FACTORS [TARGET_FACTORS ...]]
                [--target-factors-use-target-vocab TARGET_FACTORS_USE_TARGET_VOCAB [TARGET_FACTORS_USE_TARGET_VOCAB ...]]
                [--target TARGET] [--prepared-data PREPARED_DATA]
                --validation-source VALIDATION_SOURCE
                [--validation-source-factors VALIDATION_SOURCE_FACTORS [VALIDATION_SOURCE_FACTORS ...]]
                --validation-target VALIDATION_TARGET
                [--validation-target-factors VALIDATION_TARGET_FACTORS [VALIDATION_TARGET_FACTORS ...]]
                [--no-bucketing] [--bucket-width BUCKET_WIDTH]
                [--bucket-scaling] [--max-seq-len MAX_SEQ_LEN]
                [--source-vocab SOURCE_VOCAB] [--target-vocab TARGET_VOCAB]
                [--source-factor-vocabs SOURCE_FACTOR_VOCABS [SOURCE_FACTOR_VOCABS ...]]
                [--target-factor-vocabs TARGET_FACTOR_VOCABS [TARGET_FACTOR_VOCABS ...]]
                [--shared-vocab] [--num-words NUM_WORDS]
                [--word-min-count WORD_MIN_COUNT]
                [--pad-vocab-to-multiple-of PAD_VOCAB_TO_MULTIPLE_OF] --output
                OUTPUT [--overwrite-output] [--params PARAMS]
                [--allow-missing-params] [--ignore-extra-params]
                [--encoder {transformer}]
                [--decoder {transformer,ssru_transformer}]
                [--num-layers NUM_LAYERS]
                [--transformer-model-size TRANSFORMER_MODEL_SIZE]
                [--transformer-attention-heads TRANSFORMER_ATTENTION_HEADS]
                [--transformer-feed-forward-num-hidden TRANSFORMER_FEED_FORWARD_NUM_HIDDEN]
                [--transformer-feed-forward-use-glu]
                [--transformer-activation-type TRANSFORMER_ACTIVATION_TYPE]
                [--transformer-positional-embedding-type {none,fixed,learned}]
                [--transformer-preprocess TRANSFORMER_PREPROCESS]
                [--transformer-postprocess TRANSFORMER_POSTPROCESS]
                [--lhuc COMPONENT [COMPONENT ...]] [--num-embed NUM_EMBED]
                [--source-factors-num-embed SOURCE_FACTORS_NUM_EMBED [SOURCE_FACTORS_NUM_EMBED ...]]
                [--target-factors-num-embed TARGET_FACTORS_NUM_EMBED [TARGET_FACTORS_NUM_EMBED ...]]
                [--source-factors-combine {sum,average,concat} [{sum,average,concat} ...]]
                [--target-factors-combine {sum,average,concat} [{sum,average,concat} ...]]
                [--source-factors-share-embedding SOURCE_FACTORS_SHARE_EMBEDDING [SOURCE_FACTORS_SHARE_EMBEDDING ...]]
                [--target-factors-share-embedding TARGET_FACTORS_SHARE_EMBEDDING [TARGET_FACTORS_SHARE_EMBEDDING ...]]
                [--weight-tying-type {none,src_trg_softmax,src_trg,trg_softmax}]
                [--dtype {float32,float16}] [--amp] [--apex-amp]
                [--batch-size BATCH_SIZE]
                [--batch-type {sentence,word,max-word}]
                [--batch-sentences-multiple-of BATCH_SENTENCES_MULTIPLE_OF]
                [--update-interval UPDATE_INTERVAL]
                [--label-smoothing LABEL_SMOOTHING]
                [--label-smoothing-impl {mxnet,fairseq,torch}]
                [--length-task {ratio,length}]
                [--length-task-weight LENGTH_TASK_WEIGHT]
                [--length-task-layers LENGTH_TASK_LAYERS]
                [--continuous-loss {soft-dynamic-time-warping,pose-mse}]
                [--sdtw-weight SDTW_WEIGHT]
                [--continuous-length-task-weight CONTINUOUS_LENGTH_TASK_WEIGHT]
                [--pose-mse-weight POSE_MSE_WEIGHT]
                [--pose-noise-weight POSE_NOISE_WEIGHT]
                [--predict-deltas {frame-deltas,first-frame}]
                [--continuous-target-length {linear,counters}]
                [--visualize-first-validation-pose VISUALIZE_FIRST_VALIDATION_POSE]
                [--pose-config POSE_CONFIG] [--no-reloading-params]
                [--target-factors-weight TARGET_FACTORS_WEIGHT [TARGET_FACTORS_WEIGHT ...]]
                [--optimized-metric {perplexity,accuracy,length-ratio-mse,bleu,chrf,rouge1,ter,soft-dynamic-time-warping,original-dynamic-time-warping,pose-mse,frame-position-prediction}]
                [--checkpoint-interval CHECKPOINT_INTERVAL]
                [--min-samples MIN_SAMPLES] [--max-samples MAX_SAMPLES]
                [--min-updates MIN_UPDATES] [--max-updates MAX_UPDATES]
                [--max-seconds MAX_SECONDS]
                [--max-checkpoints MAX_CHECKPOINTS]
                [--max-num-checkpoint-not-improved MAX_NUM_CHECKPOINT_NOT_IMPROVED]
                [--checkpoint-improvement-threshold CHECKPOINT_IMPROVEMENT_THRESHOLD]
                [--min-num-epochs MIN_NUM_EPOCHS]
                [--max-num-epochs MAX_NUM_EPOCHS]
                [--embed-dropout EMBED_DROPOUT]
                [--fc-embed-dropout-pre FC_EMBED_DROPOUT_PRE]
                [--fc-embed-dropout-post FC_EMBED_DROPOUT_POST]
                [--transformer-dropout-attention TRANSFORMER_DROPOUT_ATTENTION]
                [--transformer-dropout-act TRANSFORMER_DROPOUT_ACT]
                [--transformer-dropout-prepost TRANSFORMER_DROPOUT_PREPOST]
                [--optimizer {adam,sgd}] [--optimizer-betas OPTIMIZER_BETAS]
                [--optimizer-eps OPTIMIZER_EPS] [--dist]
                [--initial-learning-rate INITIAL_LEARNING_RATE]
                [--weight-decay WEIGHT_DECAY] [--momentum MOMENTUM]
                [--gradient-clipping-threshold GRADIENT_CLIPPING_THRESHOLD]
                [--gradient-clipping-type {abs,norm,none}]
                [--learning-rate-scheduler-type {none,inv-sqrt-decay,linear-decay,plateau-reduce}]
                [--learning-rate-t-scale LEARNING_RATE_T_SCALE]
                [--learning-rate-reduce-factor LEARNING_RATE_REDUCE_FACTOR]
                [--learning-rate-reduce-num-not-improved LEARNING_RATE_REDUCE_NUM_NOT_IMPROVED]
                [--learning-rate-warmup LEARNING_RATE_WARMUP]
                [--fixed-param-strategy {all_except_decoder,all_except_outer_layers,all_except_embeddings,all_except_output_proj,all_except_feed_forward,encoder_and_source_embeddings,encoder_half_and_source_embeddings}]
                [--fixed-param-names [FIXED_PARAM_NAMES [FIXED_PARAM_NAMES ...]]]
                [--decode-and-evaluate DECODE_AND_EVALUATE]
                [--stop-training-on-decoder-failure] [--seed SEED]
                [--keep-last-params KEEP_LAST_PARAMS] [--keep-initializations]
                [--cache-last-best-params CACHE_LAST_BEST_PARAMS]
                [--cache-strategy {best,last,lifespan}]
                [--cache-metric {perplexity,accuracy,length-ratio-mse,bleu,chrf,rouge1,ter,soft-dynamic-time-warping,original-dynamic-time-warping,pose-mse,frame-position-prediction}]
                [--dry-run] [--device-id DEVICE_ID] [--use-cpu] [--env ENV]
                [--quiet] [--quiet-secondary-workers] [--no-logfile]
                [--loglevel {INFO,DEBUG,ERROR}]
                [--loglevel-secondary-workers {INFO,DEBUG,ERROR}]
                [--pose-type {holistic,openpose}]
train.py: error: argument --validation-source/-vs: must exist and be a regular file.

Came back the next day and had another look. I guess the validation set was not where the train.py expected.

cleong110 commented 2 months ago

Well anyway, I think at this point I'm getting severely off topic from the original Zotero issue. I'll continue this at https://github.com/cleong110/sign-sockeye-baselines/issues/1