JuliaEcosystem / PackageAnalyzer.jl

https://juliaecosystem.github.io/PackageAnalyzer.jl/dev/
MIT License
59 stars 5 forks source link

BoundsError: attempt to access empty SubString{String} at index [0] #102

Closed Datseris closed 6 months ago

Datseris commented 6 months ago

I am trying to figure out whether PackageAnalyzer would take this file:

https://github.com/MattWillFlood/EntropyHub.jl/blob/35c0cd24ae2869f177eb43c76769f3136cd08746/src/_AttnEn.jl

and associate the last lines that are license lines as code or comments or documentation:

https://github.com/MattWillFlood/EntropyHub.jl/blob/35c0cd24ae2869f177eb43c76769f3136cd08746/src/_AttnEn.jl#L100-L116

Unfortunately, there seems to be a bug.

MWE:

Downlooad locally the above cited file and then do:

using PackageAnalyzer 
file = joinpath(@__DIR__, "attention_entropyhub.jl")
lines = PackageAnalyzer.LineCategories(file)

you will get:

Error showing value of type LineCategories:
ERROR: BoundsError: attempt to access empty SubString{String} at index [0]
Stacktrace:
  [1] prevind(s::SubString{String}, i::Int64, n::Int64)
    @ Base .\strings\basic.jl:502
  [2] prevind
    @ .\strings\basic.jl:497 [inlined]
  [3] show(io::IOContext{Base.TTY}, ::MIME{Symbol("text/plain")}, per_line_category::LineCategories)    @ PackageAnalyzer.CategorizeLines C:\Users\datse\.julia\packages\PackageAnalyzer\ddM8Z\src\LineCategories.jl:74
  [4] (::REPL.var"#55#56"{REPL.REPLDisplay{REPL.LineEditREPL}, MIME{Symbol("text/plain")}, Base.RefValue{Any}})(io::Any)
    @ REPL C:\Users\datse\.julia\juliaup\julia-1.10.0+0.x64.w64.mingw32\share\julia\stdlib\v1.10\REPL\src\REPL.jl:273
  [5] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL C:\Users\datse\.julia\juliaup\julia-1.10.0+0.x64.w64.mingw32\share\julia\stdlib\v1.10\REPL\src\REPL.jl:569
  [6] display(d::REPL.REPLDisplay, mime::MIME{Symbol("text/plain")}, x::Any)
    @ REPL C:\Users\datse\.julia\juliaup\julia-1.10.0+0.x64.w64.mingw32\share\julia\stdlib\v1.10\REPL\src\REPL.jl:259
  [7] display(d::REPL.REPLDisplay, x::Any)
    @ REPL C:\Users\datse\.julia\juliaup\julia-1.10.0+0.x64.w64.mingw32\share\julia\stdlib\v1.10\REPL\src\REPL.jl:278
  [8] display(x::Any)
    @ Base.Multimedia .\multimedia.jl:340
  [9] #invokelatest#2
    @ .\essentials.jl:887 [inlined]
...

Do you know of the top of your head how these lines are classified? Because analyzing the whole package with analyze works.

ericphanson commented 6 months ago

I get

julia> file = joinpath(@__DIR__, "attention_entropyhub.jl")
"/Users/eph/EntropyHub.jl/attention_entropyhub.jl"

julia> lines = PackageAnalyzer.LineCategories(file)
ERROR: SystemError: opening file "/Users/eph/EntropyHub.jl/attention_entropyhub.jl": No such file or directory
Stacktrace:
  [1] systemerror(p::String, errno::Int32; extrainfo::Nothing)
    @ Base ./error.jl:176
  [2] systemerror
    @ ./error.jl:175 [inlined]
  [3] open(fname::String; lock::Bool, read::Nothing, write::Nothing, create::Nothing, truncate::Nothing, append::Nothing)
    @ Base ./iostream.jl:293
  [4] open
    @ ./iostream.jl:275 [inlined]
  [5] open(f::Base.var"#433#434"{String}, args::String; kwargs::@Kwargs{})
    @ Base ./io.jl:394
  [6] open
    @ ./io.jl:393 [inlined]
  [7] read
    @ ./io.jl:486 [inlined]
  [8] parse_green_node(file_path::String)
    @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/count_loc.jl:102
  [9] LineCategories(path::String)
    @ PackageAnalyzer ~/.julia/packages/PackageAnalyzer/ddM8Z/src/count_loc.jl:117
 [10] top-level scope
    @ REPL[26]:1

when trying that. Is that the right file?

ericphanson commented 6 months ago

BTW for src/_AttnEn.jl I get:

julia> file = "src/_AttnEn.jl"
"src/_AttnEn.jl"

julia> lines = PackageAnalyzer.LineCategories(file)
1    | Code      | module _AttnEn
2    | Code      | export AttnEn
3    | Code      | using StatsBase: Histogram, fit
4    | Docstring |     """
5    | Docstring |         Av4, (Hxx,Hnn,Hxn,Hnx) = AttnEn(Sig)
6    | Docstring |
7    | Docstring |     Returns the attention entropy (`Av4`) calculated as the average of the
8    | Docstring |     sub-entropies (`Hxx`,`Hxn`,`Hnn`,`Hnx`) estimated from the data sequence
9    | Docstring |     (`Sig`) using a base-2 logarithm.
10   | Docstring |
11   | Docstring |         Av4, (Hxx, Hnn, Hxn, Hnx) = AttnEn(Sig::AbstractArray{T,1} where T<:Real; Logx::Real=2)
12   | Docstring |
13   | Docstring |     Returns the attention entropy (`Av4`) and the sub-entropies (`Hxx`,`Hnn`,`Hxn`,`Hnx`)
14   | Docstring |     from the data sequence (`Sig`) where,
15   | Docstring |     Hxx:    entropy of local-maxima intervals
16   | Docstring |     Hnn:    entropy of local minima intervals
17   | Docstring |     Hxn:    entropy of intervals between local maxima and subsequent minima
18   | Docstring |     Hnx:    entropy of intervals between local minima and subsequent maxima
19   | Docstring |
20   | Docstring |     # Arguments:
21   | Docstring |     `Logx`  - Logarithm base, a positive scalar
22   | Docstring |               (Enter 0 for natural logarithm)
23   | Docstring |
24   | Docstring |     See also `EnofEn`, `SpecEn`, `XSpecEn`, `PermEn`, `MSEn`
25   | Docstring |
26   | Docstring |     # References:
27   | Docstring |         [1] Jiawei Yang, et al.,
28   | Docstring |             "Classification of Interbeat Interval Time-series Using
29   | Docstring |             Attention Entropy."
30   | Docstring |             IEEE Transactions on Affective Computing
31   | Docstring |             (2020)
32   | Docstring |
33   | Docstring |
34   | Docstring |     """
35   | Code      |     function AttnEn(Sig::AbstractArray{T,1} where T<:Real; Logx::Real=2)
36   | Blank     |
37   | Code      |     (Logx == 0) ? Logx = exp(1) : nothing
38   | Code      |     N = size(Sig,1)
39   | Code      |     (N > 10) ? nothing : error("Sig:   must be a numeric vector")
40   | Code      |     (Logx>0) ? nothing : error("Logx:  must be a positive number > 0")
41   | Blank     |
42   | Code      |     Xmax = PkFind(Sig)
43   | Code      |     Xmin = PkFind(-Sig)
44   | Code      |     Txx = diff(Xmax)
45   | Code      |     Tnn = diff(Xmin)
46   | Code      |     Temp = diff(sort(vcat(Xmax, Xmin)))
47   | Blank     |
48   | Code      |     if isempty(Xmax)
49   | Code      |         error("No local maxima found!")
50   | Code      |     elseif isempty(Xmin)
51   | Code      |         error("No local minima found!")
52   | Code      |     end
53   | Blank     |
54   | Code      |     (Xmax[1]<Xmin[1]) ? (Txn = Temp[1:2:end]; Tnx = Temp[2:2:end]) :
55   | Code      |             (Txn = Temp[2:2:end]; Tnx = Temp[1:2:end])
56   | Blank     |
57   | Code      |     Edges = -0.5:N
58   | Code      |     Pnx = fit(Histogram,Tnx,Edges).weights
59   | Code      |     Pnn = fit(Histogram,Tnn,Edges).weights
60   | Code      |     Pxx = fit(Histogram,Txx,Edges).weights
61   | Code      |     Pxn = fit(Histogram,Txn,Edges).weights
62   | Blank     |
63   | Code      |     Pnx = Pnx[Pnx.!=0]/size(Tnx,1)
64   | Code      |     Pxn = Pxn[Pxn.!=0]/size(Txn,1)
65   | Code      |     Pnn = Pnn[Pnn.!=0]/size(Tnn,1)
66   | Code      |     Pxx = Pxx[Pxx.!=0]/size(Txx,1)
67   | Blank     |
68   | Code      |     Hxx = -sum(Pxx.*(log.(Logx,Pxx)))
69   | Code      |     Hxn = -sum(Pxn.*(log.(Logx,Pxn)))
70   | Code      |     Hnx = -sum(Pnx.*(log.(Logx,Pnx)))
71   | Code      |     Hnn = -sum(Pnn.*(log.(Logx,Pnn)))
72   | Code      |     Av4 = (Hnn + Hxx + Hxn + Hnx)/4
73   | Blank     |
74   | Code      |     return Av4, (Hxx,Hnn,Hxn,Hnx)
75   | Code      |     end
76   | Blank     |
77   | Code      |     function PkFind(X)
78   | Code      |         Nx = size(X,1)
79   | Code      |         Indx = zeros(Int,Nx);
80   | Code      |         for n = 2:Nx-1
81   | Code      |             if X[n-1]< X[n] > X[n+1]
82   | Code      |                 Indx[n] = n
83   | Blank     |
84   | Code      |             elseif X[n-1] < X[n] == X[n+1]
85   | Code      |                 k = 1
86   | Code      |                 while (n+k)<Nx && X[n] == X[n+k]
87   | Code      |                     k +=1
88   | Code      |                 end
89   | Code      |                 if X[n] > X[n+k]
90   | Code      |                     Indx[n] = n + floor((k-1)/2)
91   | Code      |                 end
92   | Code      |             end
93   | Code      |         end
94   | Code      |         Indx = Indx[Indx.!==0]
95   | Code      |     return Indx
96   | Code      |     end
97   | Blank     |
98   | Code      | end
99   | Blank     |
100  | Code      | """
101  | Code      | Copyright 2021 Matthew W. Flood, EntropyHub
102  | Code      |
103  | Code      | Licensed under the Apache License, Version 2.0 (the "License");
104  | Code      | you may not use this file except in compliance with the License.
105  | Code      | You may obtain a copy of the License at
106  | Code      |
107  | Code      |      http://www.apache.org/licenses/LICENSE-2.0
108  | Code      |
109  | Code      | Unless required by applicable law or agreed to in writing, software
110  | Code      | distributed under the License is distributed on an "AS IS" BASIS,
111  | Code      | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
112  | Code      | See the License for the specific language governing permissions and
113  | Code      | limitations under the License.
114  | Code      |
115  | Code      | For Terms of Use see https://github.com/MattWillFlood/EntropyHub
116  | Code      | ""

Here I do think code is the correct result, since this is a string; it is the same as if they wrote 2+2, it's just a value not bound to a variable in the module. I agree it is trying to be used as documentation, but they should write it as a docstring or a comment rather than a string.

Datseris commented 6 months ago

Thanks a lot. Yeah I had messed up the file on my end. Downloading it correctly solved the issue.

And yes, I agree with you that this is a poor way to document the license. But in general the code in this package is a mess so I won't complain :D

ericphanson commented 6 months ago

I am curious about the error since I expected a different one (error opening file) if you just had the wrong path. If it happens again please do file an issue!