Closed brando90 closed 1 year ago
instead, just call
nougat $SOURCE/$pdf -o $DESTINATION/$pdf/ --markdown --model 0.1.0-base
instead, just call
nougat $SOURCE/$pdf -o $DESTINATION/$pdf/ --markdown --model 0.1.0-base
@lukas-blecher thank you! Curious, how big is it?
250M vs 350M parameters or 1.3GB
How do I make sure I'm using the 1.3B?
On Wed, Sep 20, 2023, 3:01 PM Lukas Blecher @.***> wrote:
250M vs 450M parameters or 1.3GB
— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/nougat/issues/102#issuecomment-1728419789, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOE6LQJ26W5XRABKMSEFJTX3NKTHANCNFSM6AAAAAA5AJJJQ4 . You are receiving this because you authored the thread.Message ID: @.***>
instead, just call
nougat $SOURCE/$pdf -o $DESTINATION/$pdf/ --markdown --model 0.1.0-base
by adding the argument --model 0.1.0-base
when calling nougat
instead, just call
nougat $SOURCE/$pdf -o $DESTINATION/$pdf/ --markdown --model 0.1.0-base
by adding the argument
--model 0.1.0-base
when calling nougat
sorry I was unclear, I wanted to double check it was the 1.3B param model but your comment confirms it. Thank you!
Just to clear this up, it's 1.3gigabytes large and 350M parameters (see paper for more info)
currently small is used:
(maf) brando9@ampere1~/data/maf_data/maf_pdfs $ for pdf in $pdfs; do
downloading nougat checkpoint version 0.1.0-small to path /lfs/ampere1/0/brando9/.cache/torch/hub/nougat-0.1.0-small config.json: 100%|███████████████████████████████████████████████████████████████████████| 557/557 [00:00<00:00, 3.07Mb/s] pytorch_model.bin: 100%|███████████████████████████████████████████████████████████████| 956M/956M [00:13<00:00, 76.0Mb/s] special_tokens_map.json: 100%|██████████████████████████████████████████████████████████| 96.0/96.0 [00:00<00:00, 641kb/s] tokenizer.json: 100%|████████████████████████████████████████████████████████████████| 2.04M/2.04M [00:00<00:00, 38.1Mb/s] tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████| 106/106 [00:00<00:00, 739kb/s]
how to use the larger one?
works?