AlDanial / cloc

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
GNU General Public License v2.0
19.75k stars 1.02k forks source link

XML files begin with `<?xml` not `<xml`. Try changing your files to include a question mark after the first `<`. #861

Closed salix2022 closed 1 month ago

salix2022 commented 1 month ago

Hello AlDanial

          XML files begin with `<?xml` not `<xml`.  Try changing your files to include a question mark after the first `<`.

Originally posted by @AlDanial in https://github.com/AlDanial/cloc/issues/856#issuecomment-2379912647

I have checked my code, and the question mark at the beginning is fine. I found that if there is a space before the question mark at the end of the statement, it can be counted by cloc. However, if there is no space before the question mark, the code size will not be counted by cloc. Both of these coding styles exist and the code runs without any issues. Shouldn't cloc count both of these coding styles?

<?xml version="1.0" encoding="Shift-JIS" ?> OK!! It is included in count result. <?xml version="1.0" encoding="Shift-JIS"?> NG!! It isn't included in count result.

AlDanial commented 1 month ago

The space before the trailing ? is irrelevant:

» cat 861.xml 
 <?xml version="1.0" encoding="Shift-JIS"?>
 <text><string>abc</string></text>

then

» cloc  861.xml 
github.com/AlDanial/cloc v 2.03  T=0.00 s (264.3 files/s, 528.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
XML                              1              0              0              2
-------------------------------------------------------------------------------

Post a file that cloc fails on; the failure must have some other cause.

salix2022 commented 1 month ago

The phenomenon I'm experiencing is quite strange. When there's a full-width Japanese hyphen in the file, cloc seems to ignore counting the size of that file. However, if I remove the full-width hyphen, it can then count the size of the file normally. I'm not sure why this is happening.

―――――――――――――――――――――――――

1729242074398

1729242118615

salix2022 commented 1 month ago

The XML file I tested was ignored by the CLOC tool for unknown reasons. 861.zip

AlDanial commented 1 month ago

Due to the comment block in this XML file, Perl thinks the file is a binary rather than text file. cloc skips binary files unless you pass it the --read-binary-files switch:

» cloc --read-binary-files 861.xml
       1 text file.
       1 unique file.                              
       0 files ignored.

github.com/AlDanial/cloc v 2.03  T=0.00 s (266.3 files/s, 2663.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
XML                              1              0              7              3
-------------------------------------------------------------------------------

(btw your 861.xml has a typo: line 9 is <job-config but should be <job-config>)

salix2022 commented 1 month ago

thank you very much!