carlosug / install.rse

Repository on the paper "detecting installation instructions" for MSR 2025
MIT License
0 stars 0 forks source link

calculate and visualise statistics for the dataset #6

Open carlosug opened 1 month ago

carlosug commented 1 month ago

developed the script somef -> statistics.py . it does:

carlosug commented 1 month ago

visualisation is at readmes_statistics.py - it does:

carlosug commented 1 month ago

basic_statistics.py has more detailed info:

carlosug commented 1 month ago

updates:

carlosug commented 1 month ago

updates:

carlosug commented 4 weeks ago

visualisation is at readmes_statistics.py - it does:

* Keyword Extraction: For each result in an item, the script preprocesses the text, uses CountVectorizer to find the top 10 keywords, and updates the keyword counters.

* Heatmap Preparation: The script aggregates keywords for each repository and creates a DataFrame of the top 10 keywords.

* Plot Heatmaps: Heatmaps are created for each repository with the top 10 keywords and their counts.

readme_statistics.py removed as it does not provide any value.

carlosug commented 4 weeks ago

developed the script somef -> statistics.py . it does:

* Preprocessing Functions: Functions to preprocess text and check for external links.

* Initialize Statistics: Initializes dictionaries to count installation methods, results, code blocks, images, hyperlinks, and tokens for each repository.

* Process Data: Iterates through each key in the JSON data to populate the dictionaries.

* Calculate Installation Methods: Calculates the number of installation methods per repository.

* Keyword Analysis: Analyzes the occurrence of specific installation keywords.

* Trends and Observations: Prints trends based on the keyword analysis.

* Installation Types: Categorizes installation types based on keywords.

* Compile Statistics: Compiles overall statistics into a DataFrame and prints it.

statistics.py to be removed

carlosug commented 4 weeks ago

main updated:

carlosug commented 2 weeks ago

main updated: