Closed tnzmnjm closed 2 months ago
repourl
and create separate DataFrames for each unique domain.source_code_hosting_platform_dfs
directory under the data
folder for better management.github_repo_request_local.py
script for codeberg
hosting platform and it works.git conflict
issue for the script utils/initial_data_preparation.py
--input-file
and --output-folder
command-line argument flags and create separate data frames for distinct domains.codeberg
data frame and it works as expected--clone-dir data/temp_dir/output_files/codeberg_output_files/cloned_repos
--keep-clones
--input-file data/temp_dir/source_code_hosting_platform_dfs/codeberg.org.csv
--output-file data/temp_dir/output_files/codeberg_output_files/updated_local_github_df_test_count.csv
--ttl-file data/temp_dir/output_files/codeberg_output_files/all_data.ttl
--test-file-list data/temp_dir/output_files/codeberg_output_files/test_files_list.txt
So far, the scripts we have are only considering github.com platform. We would like to expand our work to cover other source code hosting platforms like
pitchfork.ist
,volcanoclient.org
,code.wpia.club
, etc.