Open asmaharry opened 1 year ago
On what system are the files generated? How do you compare? Is there a difference in PyRadiomics versions? Pyradiomics includes many tests that prevent any change in calculated feature output from occurring accedentally. When feature output changes due to bug-fixes, the baseline is updated. This is logged in the changelog.
In the past I have noticed some users trying to open the output file (csv) using Excel in the wrong region setting (output culture in PyRadiomics is en-US, with "." being the decimal symbol. When opening in excel using "," as decimal symbol, the value get's transformed to a large integer). Is it possible this occurred in your case?
Thank you for the response. I am using Ubuntu system, and extract the features using the same versions of softwares(pyradiomics), Extracted features are saved in a csv format (new_X_df is dataframe that contains features ) csv_filename = os.path.join(PathToCSVs, filename) new_X_df.to_csv(csv_filename, index=False)
then reading it like this data = pd.read_csv(filepath) X1 = data.to_numpy() then compare the two numpy arrays (X1 and X2)
Please elaborate if you feel any problem here. I also observed that the features that cause this problem are wavelet-based features. Many thanks.
The part where you go to csv, then back means your values are converted to strings and therefore subject to current culture. I suspect your error is occurring there. What happens if you save and load using pickle?
We observed the differences in GLSZM features in two files, File1 contains the features that were extracted 6 months ago. File2 contains recently extracted features. Both File1 and File2 are generated on the same system. When I compared File1 with File2, the maximum error is 6digit number.
Then I restart my system and extract features for File2. Then I noticed that now the maximum difference between File1 and File 2 is minimum. After restarting the system radiomics features are changed.
Anyone noticed the same problem or have any idea what is happening here? Why I am not able to reproduce the texture features for the same dataset? @JoostJM
Thanks.