IntelLabs / control-flag

A system to flag anomalous source code expressions by learning typical expressions from training data
MIT License
1.24k stars 112 forks source link

[BUG] Authentication Error, Not Handled Correctly #32

Open Danc2050 opened 2 years ago

Danc2050 commented 2 years ago

Describe the bug When running the 100+ C code download from GitHub, some authorization request is given at progress bar %6. When given, it identifies the username and then asks for the password, but then the password input becomes the username.

Exact command to reproduce

python3 download_repos.py -f c100.txt -o training_repo_dir -m clone -p 5

Callstack (if it is a crash bug) or error info

Username for 'https://github.com': [ENTERED USERNAME]
Password for 'https://[ENTERED USERNAME]@github.com': Password for 'https://[ENTERED PASSWORD]!@github.com': Password for 'https://github.com': Password for 'https://[ENTERED USERNAME]@github.com': Password for 'https://ENTERED PASSWORD!@github.com':

After pressing enter a few times

remote: Invalid username or password.
fatal: Authentication failed for 'https://github.com/OPCFoundation/UA-AnsiC-Legacy/'

And on subsequent presses of enter different repos appear to fail to download (probably the spinner is just not showing the repos that will not download without a user logging in):

Password for 'https://github.com':
remote: Repository not found.
fatal: Authentication failed for 'https://github.com/brave/browser-ios/'

Expected behavior Repos download normally. If there is an authorization needed, it is both taken correctly. If errors exist, error handling will work such as re-requesting the user's credentials and not proceeding until it gets them and they are correct.

Environment (please complete the following information):

ControlFlag commit N/A Additional context I am using Windows Subsystem for Linux 2.

Danc2050 commented 2 years ago

I think perhaps some of these repos

a) no longer exist
b) are private

The Repository not found message is common:

remote: Repository not found.
fatal: Authentication failed for 'https://github.com/nefarius/ViGEm/'
remote: Repository not found.
fatal: Authentication failed for 'https://github.com/ulli-kroll/mt7610u/'
remote: Repository not found.
fatal: Authentication failed for 'https://github.com/muennich/mupdf/'
remote: Repository not found.
fatal: Authentication failed for 'https://github.com/concurrent-php/ext-async/'
remote: Repository not found.
fatal: Authentication failed for 'https://github.com/ddevault/aerc/'

And when I navigate to the links here I indeed see no repositories (sometimes not even a user). I will give an update of all the repositories that are no longer valid later.

Danc2050 commented 2 years ago

Here are the list as promised. Some have been taken down due to copyright issues. non_existant_repos.txt

I ran into another issue where my system is suddenly saying everything is read-only and my WSL 2 is bricked. This could be some issue with WSL, so I won't forward any details and keep the context of this issue same.

jgottschlich commented 2 years ago

Hi @Danc2050 - thanks so much for reporting this to us. @nhasabni and I will take a look at it immediately!

Best, The ControlFlag Team

nhasabni commented 2 years ago

Hi @Danc2050,

Thanks for the report and analysis. We found that some of the repositories are non-existent (as you also found), and for some reason, git clone asks for credentials for such repositories. Different versions of git seem to have different mechanisms to deal with this problem, but we found that adding -c core.askPass=echo to git clone command helps with the problem. For non-existent repositories, it will simply report an error and continue rather than waiting for username/password.

PR #34 should fix this issue. Do you want to give it a try? Let us know.

Thanks, The ControlFlag team