AlDanial / cloc

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
GNU General Public License v2.0
19.72k stars 1.02k forks source link

Reading JSON file kills cloc when counting repo #830

Closed includesec-erik closed 6 months ago

includesec-erik commented 6 months ago

Describe the bug When I cloc this repo cloc dies with "Killed".

https://github.com/cisagov/dotgov-data/

It dies when reading this file: https://github.com/cisagov/dotgov-data/blob/main/dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json

$ wc dotgov-data/dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json
0    75980 13071784 dotgov-data/dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json

cloc; OS; OS version

To Reproduce 1) Download this file locally https://github.com/cisagov/dotgov-data/blob/main/dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json 2) Run cloc on the file 3) See this output:

[~/foss/cisa/dotgov-data]
$ cloc .
      23 text files.
      21 unique files.
Killed

Expected result Cloc continuing to count and not dying entirely when one file causes issues during counting.

Additional context I tried adjusting the timeout from 1sec to 10sec, didn't fix the issue. https://github.com/AlDanial/cloc/issues/372

AlDanial commented 6 months ago

I'm unable to duplicate this issue on Ubuntu 24.04 LTS. I cloned the repo then

dotgov-data » cloc .
      23 text files.
      21 unique files.                              
       3 files ignored.

github.com/AlDanial/cloc v 2.01  T=0.76 s (27.6 files/s, 444008.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CSV                             10              0              0         305072
Text                             1              0              0          32222
Markdown                         5             39              0             56
YAML                             3              2              1             56
Bourne Shell                     1              0              0              6
JSON                             1              0              0              1
-------------------------------------------------------------------------------
SUM:                            21             41              1         337413
-------------------------------------------------------------------------------

dotgov-data » wc dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json       
       0    75980 13071784 dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json

dotgov-data » cloc dotgov-websites/pulse-subdomains-snapshot-06-08-2020-https.json
       1 text file.
       1 unique file.                              
       0 files ignored.

github.com/AlDanial/cloc v 2.01  T=0.55 s (1.8 files/s, 1.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JSON                             1              0              0              1
-------------------------------------------------------------------------------

Possibly a memory issue? My machine has 64 GB.

includesec-erik commented 6 months ago

@AlDanial I'm sorry for wasting your time this weekend, it is exactly as you describe, this was an ephemeral instance of Ubuntu from our internal pentest VM cluster that only had 1GB RAM.

Confirmed via:

root@ip-10-0-2-41:~# dmesg -T | egrep -i 'killed process'
[Sat May 25 08:52:20 2024] Out of memory: Killed process 34447 (perl) total-vm:719368kB, anon-rss:622152kB, file-rss:2304kB, shmem-rss:0kB, UID:1000 pgtables:1348kB oom_score_adj:0
[Sat May 25 08:54:40 2024] Out of memory: Killed process 34450 (perl) total-vm:718184kB, anon-rss:620808kB, file-rss:2176kB, shmem-rss:0kB, UID:1000 pgtables:1332kB oom_score_adj:0

If there is any opportunity to improve the UX with better messaging around OOM Killed processes that'd be great, otherwise closing this out as a non-issue.

AlDanial commented 6 months ago

Perl's built-in exception handling is kind of lame so even if I knew where the memory fault happened, it isn't clear I'd be able to do much about it. If you rerun on the VM with -v 3 you might be able to see which subroutine the code was running when it was killed.