An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
启动后,控制台输出启动日志如下:
(.venv) PS C:\Users\zz\PycharmProjects\Pix2Text> p2t serve -p 8503
C:\Users\zz\PycharmProjects\Pix2Text.venv\Lib\site-packages\onnxruntime\capi\onnxruntime_validation.py:26: UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.
warnings.warn(
C:\Users\zz\PycharmProjects\Pix2Text.venv\Lib\site-packages\onnxruntime\capi\onnxruntime_validation.py:26: UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only.
warnings.warn(
Fix size testing.
training chunk_sizes: [32]
The output will be saved to C:\Users\zz\PycharmProjects\Pix2Text.venv\Lib\site-packages\pix2text\doc_xl_layout....\exp\ctdet_subfield\default
heads {'hm': 11, 'cls': 4, 'ftype': 3, 'wh': 8, 'hm_sub': 2, 'wh_sub': 8, 'reg': 2, 'reg_sub': 2}
[INFO 2024-08-15 17:02:03,173 load_model:36] loading model from local file: C:\Users\zz\AppData\Roaming\pix2text\1.1\layout-parser\DocXLayout_231012.pth
启动后,控制台输出启动日志如下: (.venv) PS C:\Users\zz\PycharmProjects\Pix2Text> p2t serve -p 8503 C:\Users\zz\PycharmProjects\Pix2Text.venv\Lib\site-packages\onnxruntime\capi\onnxruntime_validation.py:26: UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only. warnings.warn( C:\Users\zz\PycharmProjects\Pix2Text.venv\Lib\site-packages\onnxruntime\capi\onnxruntime_validation.py:26: UserWarning: Unsupported Windows version (11). ONNX Runtime supports Windows 10 and above, only. warnings.warn( Fix size testing. training chunk_sizes: [32] The output will be saved to C:\Users\zz\PycharmProjects\Pix2Text.venv\Lib\site-packages\pix2text\doc_xl_layout....\exp\ctdet_subfield\default
heads {'hm': 11, 'cls': 4, 'ftype': 3, 'wh': 8, 'hm_sub': 2, 'wh_sub': 8, 'reg': 2, 'reg_sub': 2} [INFO 2024-08-15 17:02:03,173 load_model:36] loading model from local file: C:\Users\zz\AppData\Roaming\pix2text\1.1\layout-parser\DocXLayout_231012.pth
[DEBUG 2024-08-15 17:02:03,194 init:109] DocXLayoutParser parameters Namespace(task='ctdet_subfield', dataset='huntie', test=False, data_src='defaul t', exp_id='default', vis_corner=0, convert_onnx=0, onnx_path='auto', debug=0, load_model='C:\Users\zz\AppData\Roaming\pix2text\1.1\layout-parser \DocXLayout_231012.pth', resume=False, gpus=[-1], num_workers=16, not_cuda_benchmark=False, seed=317, print_iter=0, hide_data_time=False, save_all=Fals e, metric='loss', vis_thresh=0.3, nms_thresh=0.3, corner_thresh=0.3, debugger_theme='white', arch='dlav0subfield_34', head_conv=256, down_ratio=4, input _res=768, input_h=768, input_w=768, lr=0.000125, lr_step=[80], NotFixList='', num_epochs=90, batch_size=32, master_batch_size=32, num_iters=-1, val_inte rvals=5, trainval=False, negative=False, adamW=False, save_dir='C:\Users\zz\PycharmProjects\Pix2Text\.venv\Lib\site-packages\pix2text\doc_xl_la yout\..\..\exp\ctdet_subfield\default', flip_test=False, test_scales=[1.0], nms=True, K=100, fix_res=True, keep_res=False, not_rand_crop=False, shi ft=0.1, scale=0.4, rotate=0, flip=0.5, maskvisual=0.0, maskgrid=0.0, no_color_aug=False, MK=500, rot=True, warp=True, normal_padding=True, extra_channel =False, init_emb='', grid_type='char_point', finetune_emb=False, dic='', sample_limit=-1, aug_rot=0, aug_ddd=0.5, rect_mask=False, kitti_split='3dop', m se_loss=False, num_classes=13, num_secondary_classes=3, reg_loss='l1', hm_weight=1, cls_weight=1, ftype_weight=1, mk_weight=1, off_weight=1, wh_weight=1 , hp_weight=1, hm_hp_weight=1, dep_weight=1, dim_weight=1, rot_weight=1, peak_thresh=0.1, norm_wh=False, dense_wh=False, cat_spec_wh=False, not_reg_offs et=False, agnostic_ex=False, scores_thresh=0.35, center_thresh=0.3, aggr_weight=0.0, dense_hp=False, not_hm_hp=False, not_reg_hp_offset=False, not_reg_b box=False, eval_oracle_hm=False, eval_oracle_mk=False, eval_oracle_wh=False, eval_oracle_offset=False, eval_oracle_kps=False, eval_oracle_hmhp=False, ev al_oracle_hp_offset=False, eval_oracle_dep=False, gpus_str='-1', reg_offset=True, reg_bbox=True, hm_hp=True, reg_hp_offset=True, pad=0, num_stacks=1, ch unk_sizes=[32], root_dir='C:\Users\zz\PycharmProjects\Pix2Text\.venv\Lib\site-packages\pix2text\doc_xl_layout\..\..', data_dir='C:\Users\zz \PycharmProjects\Pix2Text\.venv\Lib\site-packages\pix2text\doc_xl_layout\..\..\data', exp_dir='C:\Users\zz\PycharmProjects\Pix2Text\.venv \Lib\site-packages\pix2text\doc_xl_layout\..\..\exp\ctdet_subfield', debug_dir='C:\Users\zz\PycharmProjects\Pix2Text\.venv\Lib\site-packa ges\pix2text\doc_xl_layout\..\..\exp\ctdet_subfield\default\debug', mean=array([[[ 0.4079, 0.44719, 0.47026]]], dtype=float32), std= array([[[ 0.28864, 0.27408, 0.2781]]], dtype=float32), output_h=192, output_w=192, output_res=192, heads={'hm': 11, 'cls': 4, 'ftype': 3, 'wh': 8, 'hm_sub': 2, 'wh_sub': 8, 'reg': 2, 'reg_sub': 2}, device='cpu') [INFO 2024-08-15 17:02:03,194 _get_model:194] use model: C:\Users\zz\AppData\Roaming\cnocr\2.3\densenet_lite_136-gru\cnocr-v2.3-densenet_lite_136-gru-epoch=004-ft-model.onnx [DEBUG 2024-08-15 17:02:03,194 _get_model:205] ort providers: ['CPUExecutionProvider'] [INFO 2024-08-15 17:02:03,350 _assert_and_prepare_model_files:135] use model: C:\Users\zz\AppData\Roaming\cnstd\1.2\ppocr\ch_PP-OCRv3_det_infer.onnx [INFO 2024-08-15 17:02:03,804 init:59] Use model path for MFD: C:\Users\zz\AppData\Roaming\pix2text\1.1\mfd-onnx\mfd-v20240618.onnx [INFO 2024-08-15 17:02:03,806 init:83] Use model dir for LatexOCR: C:\Users\zz\AppData\Roaming\pix2text\1.1\mfr-onnx [INFO 2024-08-15 17:02:05,428 init:99] Loaded Pix2Text MFR model mfr to: backend-onnx, device-cpu [DEBUG 2024-08-15 17:02:05,874 init:634] Using proactor: IocpProactor INFO: Started server process [17284] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8503 (Press CTRL+C to quit) 这一步应该启动成功了吧;