facebookresearch / phyre-fwd

Code accompanying paper, Forward Prediction for Physical Reasoning
Apache License 2.0
11 stars 2 forks source link

Pre-trained models not openly accessible #1

Open dido1998 opened 3 years ago

dido1998 commented 3 years ago

The pretrained models are hosted on AWS, for which we require an AWS key to access. Is there an alternative way to access the models for which I would not require a key?

rohitgirdhar commented 3 years ago

Hi @dido1998, thanks for your interest. The models are publicly accessible, you should only need a free AWS account to use the s3cmd functionality. You can also download the models directly over HTTP from a browser (like this), however using s3cmd would be easier since it would allow for viewing the directory structure.

dido1998 commented 3 years ago

Hi @rohitgirdhar, thank you for the clarification, this is really helpful :-)

Another question I had was that I trained a transformer based forward model using the following command:

./launch.py -l -c expts/fwd_models/001_fwd_TX_win.txt

I find that after training (for 200000 steps), the l2 loss does not decrease at all, is this normal? I also found the same for the joint model (https://github.com/facebookresearch/phyre-fwd/blob/master/agents/expts/joint/004_joint_TX_win.txt), where ce loss stays consistent at ~0.6.

I have uploaded logs for the above command here: https://drive.google.com/file/d/1IU4wN_3Fl7yjPsTuX1keJmHQSUyuCHXs/view?usp=sharing if that's helpful. Thanks! :-)

rohitgirdhar commented 3 years ago

Hi @dido1998 Apologies for the delay in responding. That sounds odd, the L2 loss should decrease. We have also shared all the training logs that you can refer to for further debugging. For instance, for the fold 0 you can find the tensorboard log at the following link, where I notice the loss does go down.

https://dl.fbaipublicfiles.com/phyre-fwd-agents/fwd_models/001_fwd_TX_win.txt/0/logs/events.out.tfevents.1590220334.learnfair0721.31967.0

You can use commands like s3cmd ls s3://dl.fbaipublicfiles.com/phyre-fwd-agents/fwd_models/001_fwd_TX_win.txt/0/logs/ to check for paths to logs for other experiments.

dido1998 commented 3 years ago

Hi @rohitgirdhar, Thanks for sharing the logs! This is really helpful. Unfortunately, I still can't reproduce your loss curve. Here is the tensorboard log I got: https://drive.google.com/file/d/1hSYd39sEPF9FqAtQBbGA_cN_RT5iI59A/view?usp=sharing.

I just want to confirm that for the above log that you shared, you ran the following command and selected fold 0 when prompted right?

./launch.py -l -c expts/fwd_models/001_fwd_TX_win.txt
rohitgirdhar commented 3 years ago

We actually run it on a full node with 8 GPUs. The -l option is to run it locally (for debugging) and overrides the config options here with a batch size of 2. Maybe that is the issue? Try commenting out those lines and running, it should use 8 GPUs and a larger batch size and hopefully show a learning curve closer to our logs.

ishaanchandratreya commented 3 years ago

Hi @rohitgirdhar , what memory are each of your GPUs? I'm trying to run the Tx experiments (which are naturally quite large) and running out of CUDA memory fairly quickly on a single fold even with half your batch size.

rohitgirdhar commented 3 years ago

Hi @ishaanchandratreya apologies for the delay in responding. I'm not exactly sure for that model, but most experiments were done on 16GB V100 GPUs, so that should fit those models. The log files for the specific experiment you are looking at might have more details too.

ishaanchandratreya commented 3 years ago

Hi @rohitgirdhar Thanks for the info- I was able to sort the CUDA issue out. I am additionally also having trouble setting up the s3cmd functionality as it tells me that s3:listAllBuckets has not been configured for my user. When I attempt to view your bucket on AWS console as a public bucket it reminds me the same thing about permissions/actions. I am however able to download files using HTTP over a browser. Could you clarify instructions for s3cmd setup/permissions for access to fbaipublicfiles or provide the full directory structure of what has been uploaded to the phyre-fwd-agents bucket on AWS so that I can simply download using HTTP. Thanks a lot in advance.

ishaanchandratreya commented 3 years ago

Just following up on this- I tried accessing s3cmd buckets with an alternate configuration and I keep getting access denied. Can you please send me s3cmd instructions for listAllBuckets facilitity on the public bucket or alternatively give the directory structure. Under my current config, the instructions under agents/Readme do not work for me as I either get an SSL certificate issue or I get access denied.

lauragustafson commented 3 years ago

@ishaanchandratreya, you're saying that you don't have listAllBuckets permission on the s3 bucket correct?

ishaanchandratreya commented 3 years ago

Yes. I get Permission Denied with both ls and rsync commands, so I believe it would be more than listAllBuckets permission which is missing.

jeasinema commented 2 years ago

Same here. I tried this link in my browser and still got access denied. @rohitgirdhar could you help with this?

rohitgirdhar commented 2 years ago

Hi @jeasinema, that link is not supposed to be accessible from the browser, only individual model/logs links are. Does this link to some logs work for you? You can refer to https://github.com/facebookresearch/phyre-fwd/tree/main/agents#downloading-pre-trained-models for instructions on downloading the models.

jeasinema commented 2 years ago

@rohitgirdhar Thanks for the reply. Yes, the link with logs works for me. However, when I try this and run the following command

s3cmd sync --skip-existing s3://dl.fbaipublicfiles.com/phyre-fwd-agents/joint/001_joint_DEC_1f_win.txt/20/ outputs/expts/joint/001_joint_DEC_1f_win.txt/20/

this error popped up

ERROR: S3 error: 403 (AccessDenied): Access Denied

I tried to list all the files under that directory

s3cmd ls s3://dl.fbaipublicfiles.com/phyre-fwd-agents/joint/001_joint_DEC_1f_win.txt/20/ 

and got the same error again.

I can confirm my s3tools has been configured correctly as I can list all the files in my own s3 bucket. Could you help shed some light on this? Thanks.

rohitgirdhar commented 2 years ago

That is indeed strange... the files inside seem to be accessible to me. I can even download it via HTTPS (eg, path to the model: https://dl.fbaipublicfiles.com/phyre-fwd-agents/joint/001_joint_DEC_1f_win.txt/20/ckpt.00150000). @lauragustafson would you have any ideas on what might be causing this? For now I'd recommend downloading the models using HTTPS -- you can use "ckpt.%08d" % train.num_iter to get the name of the checkpoint file.

jeasinema commented 2 years ago

@rohitgirdhar I tried your link via HTTPS and it worked for me. However, if I use s3cmd as below

 s3cmd sync --skip-existing s3://dl.fbaipublicfiles.com/phyre-fwd-agents/joint/001_joint_DEC_1f_win.txt/20/ckpt.00150000 .

or

s3cmd get s3://dl.fbaipublicfiles.com/phyre-fwd-agents/joint/001_joint_DEC_1f_win.txt/20/ckpt.00150000 ckpt.00150000

the same error popped up.

I will use HTTPS for now. Thank you so much for the help!