Closed drzraf closed 1 year ago
Not all heroes wear capes. Will review tomorrow and if everything looks good, we'll ship at as soon as possible. Thanks for putting the time into this.
flake8 error.
Do you have example test upload from this branch?
edit:
Do we need to lock that dep version? Can that be lifted?
I have questions @drzraf
When you answer these three questions I'll merge.
* Do we need the urllib dep locked to that version?
No, I just reverted a previous commit for sake of commit history/revision tracking/reviewing process. I can remove the version lock if you prefer so.
* Can you show me a example upload where the IP address is redacted?
It's not clear to me what do you want and why. A modified file? It easy to generate one:
#!/usr/bin/python3
import json, sys
from tubeup.utils import strip_ip_from_meta
with open(sys.argv[1], 'r') as f:
_, new_meta = strip_ip_from_meta(json.load(f))
print(json.dumps(new_meta))
# Run with test.py tests/test_tubeup_rootdir/downloads/Mountain_3_-_Video_Background_HD_1080p-6iRV8liah8A.info.json
I'm attaching a modified file if that's what you meant: a.json.gz
* Can you show this doesn't trigger false positives?
It's safe because:
r'/ip/[^/]+', r'/ip/REDACTED'
/ip/
is very specific (and I've empirically looked for it among hundreds of info files) and following the observed structure of Youtube URL, is always followed by a (non-empty) IP address./ip/
followed by something which is not an IP address (or using ip=
parameter for some other purpose), nor have I resources and will to do so. The underlying idea is to fix an actual grave concern more than overthink an improbable rare inoffensive future one. Let's keep in mind that, overall, these files contains metadata related to the downloaded video. Some are very useful, other are garbage. In the case of these URL, then often contain expiring tokens/query parameters making them unreachable after a specific amount of time: they are in no way stable or canonical resources/identifier (some suggested to even drop them entirely). BTW, Google could still identify uploaders based on the temporary tokens parameters which this PR doesn't currently redact.
tl;dr : These are mostly unimportant metadata but reasonable efforts have been provided to ensure safely obfuscation so that no side-effect is expected on Youtube and there is very very low probability a significant side-effect happened to the URLs from another provider.
BTW, Google could still identify uploaders based on the temporary tokens parameters which this PR doesn't currently redact.
Useless because most people upload from Linux where they don't normally log in, or use a separate cookie file.
Edit:
Thanks for this I'll cut a new version later.
It's not clear to me what do you want and why.
I was asking for a link to a example upload where you tested this PR. Beyond just the test. I wanted to inspect the results of what it does.
Your PR broke Tubeup. Expect a hard reset on the repo, and extreme scrutiny and testing standards for your PRs in the future.
Would you mind explaining what did it broke? NB: I have diff-ed info json files before/after processing, and visually compared them in order to ensure adequate replacements.
Would you mind explaining what did it broke? NB: I have diff-ed info json files before/after processing, and visually compared them in order to ensure adequate replacements.
This is a lot of effort to not show me a URL.
That PR put me through so much work, I tried to lift then re-added the urllib3 version lock, pushed new versions to pypi. I had to yank 3 releases due to that PR then hard reset Github I couldn't revert your PR due to me doing a bunch of other work after it, which is now all gone.
Personally I'm not seeing a security issue from the IP in a VOD URL. Use a VPS.
show me a URL
, while syntactically correct, makes no sense. URL shown: "https://google.com". Could you please be more explicit?My main question remains : what's wrong with the PR? By answering, it could be improved, tested, and finally merged.
I obviously assume good faith. I also assume only technical implementations details are at stake here (unlike in #311). If, for any reason, you wished submitter's IP addresses to end up published on archive.org, then it'd be definitely preferible to state it formally so that the discussion could be elegantly and quickly closed.
Otherwise, I suggest to reopen the PR and offer details about the problems it may have cause, so that I revamp and improve it.
thx
show me a URL
, while syntactically correct, makes no sense. URL shown: "https://google.com". Could you please be more explicit?
He just wants the URL of a successful upload with this patch. I'd assume your code is right, but only works for YouTube and breaks other sites.
No it broke everything.
@mrpapersonic If you want this PR in, I want it verified it works. So since the author cannot just give me an example URL, if you want this in make sure it works first. Because me and #313 saw how it was broken
@drzraf
Hi, you haven't enabled the ability to make issues on your fork so I'm doing this here.
Every time a item is uploaded to Archive.org the "Scanner" tag on the item lists the script name and version:
Scanner: TubeUp Video Stream Mirroring Application 2023.08.19
BinAnon is requesting you change this, you can make it "tubeup-drzraf" or whatever you want, but not our Scanner ID. Your fork divirges from ours in function significantly, specifically with open yt-dlp flags that allow for non-standard output and we don't want staff or future historians confusing the two. This is for automation of fixing items by IA staff.
Additionally you list a link to our PyPi repo. Please also change that if you're going to fork.
Please change this ASAP.
Thanks.
Long needed, long requested feature to stop leaking contributors/uploaders IP addresses.
145, #146, #192
Here finally implemented by carefully parsing the URL as suggested previously.