Closed clorton closed 1 month ago
Would it make sense to add an optional ctor which takes a file/path such that the csv load and read is done for you?
File "/home/jbloedow/PLAY/LASER/laser-chris-nnmm/src/laser_core/demographics/kmestimator.py", line 44, in predict_year_of_death
year_of_death = _pyod(ages_years, self.cumulative_deaths, max_year)
File "/var/opt/idm/venv_laser/lib/python3.10/site-packages/numba/core/dispatcher.py", line 658, in _explain_matching_error
raise TypeError(msg)
TypeError: No matching definition for argument type(s) array(float64, 1d, C), array(float64, 1d, C), uint32
from
# Ages of individuals in years (example data)
ages_years = np.array([40, 50, 60], dtype=np.float64) # Ensure ages_years is float64
# Convert max_year to uint32
max_year = np.uint32(100)
# Predict the year of death for individuals
predicted_year_of_death = estimator.predict_year_of_death(ages_years, max_year=max_year)
Can the code be more robust to the datatypes it's given?
Would it make sense to add an optional ctor which takes a file/path such that the csv load and read is done for you?
The current CSV function is so specific to a particular format that I don't want to tie it to the KME ctor.
Predicted Year of Death for 1M individuals took: 0.008349895477294922 seconds
Predicted Age at Death (in days) for 1M individuals took: 0.018344402313232422 seconds
Which seems good.
Would it make sense to add an optional ctor which takes a file/path such that the csv load and read is done for you?
The current CSV function is so specific to a particular format that I don't want to tie it to the KME ctor.
It's a two-column csv where the first column is age in years and second column is cumulative deaths. I even asked GPT for the least surprising csv fileformat for an input file to this class and this was is, with header-line optional. I think it should have built-in support for reading in a file like this. :)
Would it make sense to add an optional ctor which takes a file/path such that the csv load and read is done for you?
The current CSV function is so specific to a particular format that I don't want to tie it to the KME ctor.
It's a two-column csv where the first column is age in years and second column is cumulative deaths. I even asked GPT for the least surprising csv fileformat for an input file to this class and this was is, with header-line optional. I think it should have built-in support for reading in a file like this. :)
Apologies, I was thinking about the population pyramid CSV with header, min-max, and final bucket particulars. Yes, we can have an optional parameter for a CSV filename/path.
Would it make sense to add an optional ctor which takes a file/path such that the csv load and read is done for you?
The current CSV function is so specific to a particular format that I don't want to tie it to the KME ctor.
It's a two-column csv where the first column is age in years and second column is cumulative deaths. I even asked GPT for the least surprising csv fileformat for an input file to this class and this was is, with header-line optional. I think it should have built-in support for reading in a file like this. :)
Apologies, I was thinking about the population pyramid CSV with header, min-max, and final bucket particulars. Yes, we can have an optional parameter for a CSV filename/path.
You're making me want to go back and push for a truly built-in load-from-file in the AliasedDistribution/pyramid code. :) We've already got the load_ function provided as a utility so seems like not too much of a stretch to use that implicitly if the file provided matches. We could modify the fileformat to be a bit more standard if we don't like the hyphen separators for age ranges.
Re. population pyramid file format, how about this regularized schema:
AgeMin,AgeMax,Males,Females
min0,max0,males0,females0
min1,max1,males1,females1
...
minN,maxN,malesN,femalesN
Actual header text, probably not important, I would support a hasheader
argument which could be False
to indicate a "naked" CSV.
Each bin would be [min,max] - closed interval, i.e., max is a valid year/age for someone in that bin.
Would we enforce maxN==minN
?
Do you want to open another issue/ticket for supporting this (since it isn't germane to the Kaplan-Meier Estimator)?
Thanks for accommodating my obsession with input files! :) Just trying one more thing and then will sign off on this.
Good enough for me.
Useful for initializing predicted age at death.
Fixes #43 and #41
Should be reviewed after #64 and #71 are merged to cut down on apparent changes.