Open parkLGW opened 1 year ago
what the languange of the content of the pdf . It seems only support english yet
well,the pdf is chinese. Thanks for your reply
How to view the mmd file
How to view the mmd file
Hey @lucasjinreal, you can use Mathpix Markdown extention in VS Code.
I got [MISSING_PAGE_EMPTY:1] too, and there was no chinese in my document, only mathmetics equations
In my output results, I also encountered the error [MISSING_PAGE_FAIL:xxx]
, but it's not consistent. Instead, it appears sporadically within some of the output results. Some PDFs only yield a small number of errors, while others have more than half of their pages incorrectly displayed due to MISSING_PAGE_FAIL
after processing. Additionally, in the command line, I noticed that the count of WARNING:root:Found repetitions in sample xxx
and WARNING:root:Skipping page xxx due to repetitions.
seems to correlate with the number of MISSING_PAGE_FAIL
instances in the results. I'm curious about what characteristics this has to do with PDFs as currently I haven't found any pattern.
Seeing this error a lot. I've attached two examples that consistently produce this error. solar.pdf units.pdf
None of these images resemble an academic document. Nougat was trained on mostly arxiv papers (which are predominantly in English). There is some generalization to different document types eg of older papers, but it is expected that input images that differ from the training domain too much won't get recognized.
I have been using this on some pdfs, I am primarily seeing below issues, could you help me with the way forward for these? I had seen MISSING_PAGE_FAIL error at many places, so I added no-skipping argument while running the inference, but with this I am seeing:
Could you please suggest ways to move forward here?
Note: My inference data resembles the structure of academic documents.
Here is my file, I am getting [MISSING_PAGE_EMPTY:1] formula.pdf
here is the output
where \(\tau\) is the delay time.
Cao's method [64] computes \(E_{1}\) and \(E_{2}\) for the data set of dimension 1 up to a dimension of \(D\), which is the largest embedding dimension, used for calculate. \(E_{1}\) and \(E_{2}\) defined as follows:
\[E_{1}(d)=\frac{1}{N-d\tau}\left|\sum_{i=1}^{N-d\tau}\left|x_{i+ dt}-x_{n(i,d)+dt}\right|\right| \tag{5.90}\] \[E_{2}(d)=E_{1}(d+1)/E_{1}(d) \tag{5.91}\]
wherein \(d\) is the embedding dimension, \(N\) is the number of data points, \(\tau\) is the embedding delay, \(x_{i+dt}\) and \(x_{n(i,d)+dt}\) is the \(i\)-\(th\) vector in the data sets and its nearest neighbors of d-dimensional phase space.
##### 5.6.1.2 Largest Lyapunov Exponent (LLE)
The basic characteristics of chaotic motion are that the movement is extremely sensitive to initial conditions, two very close initial values resulting in orbit over time by separating exponentially, Lyapunov exponent [66, 67] that describes the amount of this phenomenon.
We use the algorithm of Rosenstein et al. [67] to calculate the LLE. The results were carried out with Tisean package [68], version 3.01. Consider the representation of the time series data as a trajectory in the embedding space, and assume that observe a very close return \(s_{n^{\prime}}\) to a previously visited point \(s_{n}\). Then consider the distance \(\Delta_{0}=s_{n}-s_{n^{\prime}}\) as a small perturbation, \(\Delta l=s_{n+l}-s_{n^{\prime}+l}\). If one finds that \(\left|\Delta_{l}\right.\mid\approx\Delta_{0}e^{\Delta l}\) then \(\lambda\) is the largest Lyapunov exponent.
Assuming \(S(\varepsilon,m,t)\) exhibits a linear increase with identical slope for all \(m\) larger than some \(m_{0}\) and for a reasonable range of \(\varepsilon\), and then this slope can be taken as an estimate of the largest exponent.
\[S(\varepsilon,m,t)=\left\{\,\ln\left(\frac{1}{u_{n}}\sum_{s_{n^{\prime}}\in u _{n}}\left|s_{n+t}-s_{n^{\prime}+l}\right|\right)\right\}_{n} \tag{5.92}\]
##### 5.6.1.3 Correlation Dimension
The correlation dimension method is used for detecting the presence possibility of chaos. An algorithm proposed by Grassberger and Procaccia [65] is the most```
after run this command: nougat xxx.pdf , I got the mmd file, but there is no content in this file. What's the reason?