30350n / inventree_part_import

CLI to import parts from suppliers like DigiKey, LCSC, Mouser, etc. to InvenTree
MIT License
28 stars 8 forks source link

Fix LCSC datasheets #42

Closed zenermerps closed 5 months ago

zenermerps commented 5 months ago

LCSC recently switched their datasheet links on the main product page to point to a PDF view with an ordering panel to its side. So the datasheet that now gets downloaded by the tool is actually html with .pdf extension. Please update the crawler to actually download the embedded pdf from the site instead of just the HTML.

Edit: The filename of the pdf itself seems to stay consistent from the link in the product page to the actual file, so just the domain and path need to be replaced, e.g.

For product page https://www.lcsc.com/product-detail/Monitors-Reset-Circuits_LOWPOWER-LP5300B6F_C387703.html, with "datasheet" link https://datasheet.lcsc.com/lcsc/1912111437_LOWPOWER-LP5300B6F_C387703.pdf the actual pdf can be found at https://wmsc.lcsc.com/wmsc/upload/file/pdf/v2/lcsc/1912111437_LOWPOWER-LP5300B6F_C387703.pdf

30350n commented 5 months ago

Thanks for reporting and also for presenting the fix right away! ^^

For product page https://www.lcsc.com/product-detail/Monitors-Reset-Circuits_LOWPOWER-LP5300B6F_C387703.html, with "datasheet" link https://datasheet.lcsc.com/lcsc/1912111437_LOWPOWER-LP5300B6F_C387703.pdf

Hm I do actually seem to be getting a download redirect from that link too, but I guess the download function doesn't properly pick that up.

I'm replacing "//datasheet.lcsc.com/" with "//wmsc.lcsc.com/wmsc/upload/file/pdf/v2/" in LCSC datasheet urls now, could you quickly confirm that this works as intended and fixes the issue?

zenermerps commented 5 months ago

Tried it with the example I gave above with current git master and it works now, thanks for the quick fix!

30350n commented 5 months ago

Perfect, thanks for confirming!