breezedeus / Pix2Text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
https://p2t.breezedeus.com
MIT License
1.99k stars 191 forks source link

本地推理效果和huggingface demo不一样 #145

Open TideDra opened 2 months ago

TideDra commented 2 months ago

感谢大佬开源的模型! 我在huggingface demo上试用了一下,效果很不错:

截屏2024-09-18 23 08 24

但是在本地用相同的配置跑同一张图片,检测效果却变差了: 命令:p2t predict -i "/Users/geary.z/Desktop/截屏2024-09-18 18.10.00.png" --save-debug-res debug.jpg --file-type text_formula --layout-config "{\"scores_thresh\": 0.45}" 输出:

For a single mutation in the learngenes, we consider its possibility independently across all layers, with each layer exhibiting a probability $p_{m}$ for undergoing this mutation. The likelihood of either increasing or decreasing a specific kernel in eacn layer is then computed as follows:
where $p_{l}^{+}$ decreasing a kernel in
$$
p_{l}^{+}=\alpha\cdot\frac{| K_{l} |} {n_{K}^{l}-| K_{l} |} \quad\mathrm{a n d} \quad p_{l}^{-}=1-p_{l}^{+} \tag{1}
$$
a $p_{l}^{-}$ represent the probabilities of increasing and
 $\l$ -th layer of the learngene.

debug

请问这是怎么回事呢

TideDra commented 2 months ago

这是原图:

截屏2024-09-18 18 10 00
breezedeus commented 2 months ago

对比下具体的参数取值。demo 的代码都是开源的,可以看看差异。

TideDra commented 2 months ago

对比下具体的参数取值。demo 的代码都是开源的,可以看看差异。

我fork了demo的代码,分别部署在了huggingface space(cpu),google colab(cpu)和本地(macos14, m3),发现用cpu跑都正常,只有m3跑出来的效果很差,请问这是啥原因呢

breezedeus commented 2 months ago

我的 M3 跑没发现类似的问题。你可以试试 M3 命令行结果是否跟 web 一致。