LCSC: Follow first 'pdfUrl' link to get real datasheet URL

Part-DB / Part-DB-server

Part-DB is an Open source inventory management system for your electronic components

https://docs.part-db.de/

GNU Affero General Public License v3.0

891 stars 96 forks source link

LCSC: Follow first 'pdfUrl' link to get real datasheet URL #582

Closed frank-f closed 6 months ago

frank-f commented 6 months ago

LCSC changed their website. The "pdfUrl" key does not contain the link to the actual PDF anymore. It now links to a website that shows the datasheet side by side with an "add to cart" form. That does not work at all if it's downloaded by Part-DB so I patched the LCSC provider, to open the first link and extract the actual PDF URL from it. I was getting some strange errors while testing, so I included a Referrer and User-Agent header. It's been working 100% for me since then.

PS: Unfortunately I could not find an API method to get the real datasheet URL, so I resorted to scraping the HTML website. If anyone can get their hands on an LCSC API description (official or not), I'll be more than happy to update the code.

codecov[bot] commented 6 months ago

Codecov Report

Attention: Patch coverage is 0% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 61.63%. Comparing base (9b8d4c5) to head (b62cfd8). Report is 2 commits behind head on master.

:exclamation: Current head b62cfd8 differs from pull request most recent head ff40162. Consider uploading reports for the commit ff40162 to get more accurate results

Files	Patch %	Lines
...ices/InfoProviderSystem/Providers/LCSCProvider.php	0.00%	14 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #582 +/- ## ============================================ - Coverage 61.69% 61.63% -0.07% - Complexity 5460 5464 +4 ============================================ Files 503 503 Lines 18219 18232 +13 ============================================ - Hits 11241 11238 -3 - Misses 6978 6994 +16 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jbtronics commented 6 months ago

See the review. Besides it looks fine, I think. Thank you

frank-f commented 6 months ago

I don't understand what that Codecov report means. What do I have to do?

jbtronics commented 6 months ago

Its not about the Codecov reports. Its about my comment about Line 107 and 108:

This seems unnecessarily complicated. I think you could just do something like $url = trim($matches[2], '"'); to get rid of the quotes around the URL, instead of constructing a JSON object and decoding it immedeatly.

frank-f commented 6 months ago

Its not about the Codecov reports. Its about my comment about Line 107 and 108:

This seems unnecessarily complicated. I think you could just do something like $url = trim($matches[2], '"'); to get rid of the quotes around the URL, instead of constructing a JSON object and decoding it immedeatly.

Oh, sorry, I don't see that comment anywhere.

I took this complicated looking approach, because the String is actually encoded and this seemed like the most fail-proof way to decode it. It looks like this in the source:

pdfUrl: "https:\u002F\u002Fwmsc.lcsc.com\u002Fwmsc\u002Fupload\u002Ffile\u002Fpdf\u002Fv2\u002Flcsc\u002F2304140030_Microchip-Tech-MCP2562T-E-MF_C191416.pdf",