danny0838 / webscrapbook

A browser extension that captures web pages to local device or backend server for future retrieval, organization, annotation, and edit. This project inherits from legacy Firefox add-on ScrapBook X.
Mozilla Public License 2.0
908 stars 121 forks source link

On a page is a youtube video - is it possible to download it? And pdf does not download too... #394

Closed leonidrysev closed 2 months ago

leonidrysev commented 2 months ago

Thank you for a GREAT app!!!! A lot of year I could not find the same!

Could you help me? On a page is a youtube video - is it possible to download it? And pdf does not download too. There is the Link to the original file, but not download(( I set 2 in Depth to capture linked pages:

CODE:


обеды.pdf

danny0838 commented 2 months ago

Streamed video cannot be captured. This is documented in the known issues.

Linked PDF files can be downloaded (but not too much different from a normal browser save). However, the PDF link you have provided seems to be protected, you need to get the right to access to download it.

leonidrysev commented 2 months ago

Thank you! But about pdf - if I get it by right buttom of mouse? it download.. image

leonidrysev commented 2 months ago

It works!!! I need to set here so. Right&) image

leonidrysev commented 2 months ago

Just not right cyrillic symbols in url, but url works Is it possible to fix cyrillic symbols ? image

leonidrysev commented 2 months ago

One more last question! Is it possible to keep link to youtube? Now just black image

danny0838 commented 2 months ago

Just not right cyrillic symbols in url, but url works Is it possible to fix cyrillic symbols ?

I don't get it. Please be clear about:

  1. what you have done
  2. what is the expected result
  3. what is the actual result
danny0838 commented 2 months ago

One more last question! Is it possible to keep link to youtube? Now just black

Unfortunately, no.

A stream video is not just a URL. WSB only sees a blob URL that can not be directly accessed.

You'll need a specialized tool to download a streamed video.

leonidrysev commented 2 months ago

I record video: https://www.dropbox.com/scl/fi/bb70xoug9vakktoym7kaf/Video_2024-09-22_141730.mp4?rlkey=4gu7nwvnytekg8iyqltopqsjz&dl=0

Name in web is: https://sisterproject.getcourse.ru/pl/fileservice/user/file/download/h/**86f4606649bd3458f7ffb23011b72c67.pdf**

Name in data is ÐппеÑиÑ%20нов.pdf

leonidrysev commented 2 months ago

One more last question! Is it possible to keep link to youtube? Now just black

Unfortunately, no.

A stream video is not just a URL. WSB only sees a blob URL that can not be directly accessed.

You'll need a specialized tool to download a streamed video.

After reset options I see preveiw youtube ok and with link. To download video I got links from data and used https://youtubeplaylist.cc/ )) Its great! Thank you again!)

image

danny0838 commented 2 months ago

I record video: https://www.dropbox.com/scl/fi/bb70xoug9vakktoym7kaf/Video_2024-09-22_141730.mp4?rlkey=4gu7nwvnytekg8iyqltopqsjz&dl=0

Name in web is: https://sisterproject.getcourse.ru/pl/fileservice/user/file/download/h/**86f4606649bd3458f7ffb23011b72c67.pdf**

Name in data is ÐппеÑиÑ%20нов.pdf

I don't see the relationship between the video and the PDF? Please be more clear about it.

leonidrysev commented 2 months ago

Is it possible to use normal symbols in file name? No strange such: image

Name in web is: https://sisterproject.getcourse.ru/pl/fileservice/user/file/download/h/**86f4606649bd3458f7ffb23011b72c67.pdf**

Name in data is ÐппеÑиÑ%20нов.pdf

danny0838 commented 2 months ago

Please be clear about the detailed steps of how you captured the PDF file.

leonidrysev commented 2 months ago

Please be clear about the detailed steps of how you captured the PDF file.

I show it on my video) https://www.dropbox.com/scl/fi/bb70xoug9vakktoym7kaf/Video_2024-09-22_141730.mp4?rlkey=4gu7nwvnytekg8iyqltopqsjz&dl=0

I reset setting On download pdf image

Clicked image

Got pdf in data image

danny0838 commented 2 months ago

The filename is usually determined by the HTTP header. Unfortunately the webpage you provided is protected and I cannot access it, and thus you have to do the investigation by yourself.

Please Provide the filename if you save the PDF file with native browser saving.

If possible please also try getting the header of the file:

  1. Open the network section of the console (usually by right clicking on the web page > Inspect > Network)
  2. Click on the PDF file link to visit it.
  3. Find the corresponding URL of the PDF file in the list of the console, and report the response headers, especially the Content-Disposition.

圖片

danny0838 commented 2 months ago

Thank you. (Deleted the previous comment which contains sensitive information.)

The Content-Disposition response header of the web app for https://sisterproject.getcourse.ru/pl/fileservice/user/file/download/h/321a8a333dc3b9d0e61272abfcfe31a0.pdf is malformed:

It looks like:

Content-Disposition: inline; filename="обеды.pdf"

Where the filename обеды.pdf is incorrectly encoded as \xD0\xBE\xD0\xB1\xD0\xB5\xD0\xB4\xD1\x8B.pdf.

This is not compliant with the spec of HTTP headers, where filename allows only ASCII chars, and any other multi-byte chars should be sent with filename* field with UTF-8 encoding like:

Content-Disposition: inline; filename*=UTF-8''%D0%BE%D0%B1%D0%B5%D0%B4%D1%8B.pdf; filename=_.pdf

This is an issue of the website rather than WSB. To fix it you have to request the website master.

leonidrysev commented 2 months ago

Thank you very much! Happyness and Greate project!