IllDepence / unarXive

A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network
MIT License
259 stars 19 forks source link

Accessing actual figure image files #16

Closed IIZCODEII closed 1 year ago

IIZCODEII commented 1 year ago

Hello,

Is there a way to access to actual figure image files using, for instance, their id ?

{'c575cbb5-2504-4327-aa59-d1e6c97c0a53': {'caption': 'Quantum trajectories for harmonic oscillators.In eachcase,the oscillation period,τ=2π/ω=888.57au\\tau =2 \\pi /\\omega =888.57 au.In cases A & D ,m=2000aum=2000 au while in case B,m=200aum =200 au. Case D is a set of classical trajectories (Q=0Q=0)for this system.',
  'type': 'figure'}

Thanks !

IllDepence commented 1 year ago

Hi,

the IDs, such as c575cbb5-2504-4327-aa59-d1e6c97c0a53 in your example, are assigned during the dataset generation.

Because unarXive doesn't contain the figures themselves, and in the source data on arXiv they don't have unique IDs, there's no dedicated mapping.

One way to retrieve them might be to

IIZCODEII commented 1 year ago

I'll take a look at that ! Thank you !