Closed Henrike-Schwenn closed 2 years ago
05.07.2021 60 min
IMPORTANT: PyCharm Run/Debug Configuration needs to match the file you wish to run!!
import sys
sys.path.append(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/CsvDataset")
import CsvDataset.ClassCSVDataset
print("Huch!")
TrainingSet = CsvDataset.ClassCSVDataset.CsvObject(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets",
"train.csv", "trainingSet")
print(TrainingSet.__class__)
TrainingSet.CreateDataframe()
FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'
Need to restart PC in order make Python find new directory / file?
08.07.2021 60 min
Research FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'
Test if Python finds directory / file at all
import os
import sys
for f in os.listdir("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets"):
print(f)
sampleSubmission.csv
test.csv
TestCsvFile1.csv
train.csv
It does.
No difference
C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\pydevd.py" --cmd-line --multiproc --qt-support=auto --client 127.0.0.1 --port 64692 --file C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/CsvDataset/ClassCSVDataset.py
Connected to pydev debugger (build 211.7142.13)
import sys; print('Python %s on %s' % (sys.version, sys.platform))
Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32 bit (Intel)] on win32
TrainingSet = CsvObject("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets",
"train.csv", "trainingSet")
FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'
It does.
class CsvObject:
def __init__(self, pathCsvDataset="Directory leading to a csv file", csvDataset="Csv file to be read", csvDataframe="Name of Dataframe"):
self.pathCsvDataset = pathCsvDataset
self.csvDataset = csvDataset
self.csvDataframe = pandas.read_csv(self.csvDataset) == csvDataframe #Double equal signs!!
TrainingSet = CsvObject("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets",
"train.csv", "trainingSet")
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<string>", line 1, in <module>
File "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/CsvDataset/ClassCSVDataset.py", line 12, in __init__
self.csvDataframe = pandas.read_csv(self.csvDataset) == csvDataframe #Double equal signs!!
File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 610, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 462, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 819, in __init__
self._engine = self._make_engine(self.engine)
File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 1050, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 1867, in __init__
self._open_handles(src, kwds)
File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 1362, in _open_handles
self.handles = get_handle(
File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\common.py", line 642, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'
Constructor contains functions pandas.read-csv. That means that Python needs to access the file assigned to "csvDataset" now, but hasn't been given the command os.chdir(self.pathCsvDataset) yet.
class CsvObject:
def __init__(self, pathCsvDataset="Directory leading to a csv file", csvDataset="Csv file to be read", csvDataframe="Name of Dataframe"):
self.pathCsvDataset = pathCsvDataset
self.csvDataset = os.chdir(self.pathCsvDataset) == csvDataset
self.csvDataframe = pandas.read_csv(self.csvDataset) == csvDataframe
Bug fixed.
New bug:
ValueError: Invalid file path or buffer object type: <class 'bool'>
https://github.com/dask/hdfs3/issues/122
Bug caused by Pandas. pd.read_csv
19.07.21 50 min
23.07.21 25 min
**26.07.21 50 min***
Lessons learnt:
Renaming a column can solve problems
pandas.dtypes
displays header and data type of each column
Column "count" renamed into "rent_count"
print(trainingSetFirstCycle)
datetime season holiday ... casual registered rent_count
0 2011-01-01 00:00:00 1 0 ... 3 13 16
1 2011-01-01 01:00:00 1 0 ... 8 32 40
2 2011-01-01 02:00:00 1 0 ... 5 27 32
3 2011-01-01 03:00:00 1 0 ... 3 10 13
print(trainingSetFirstCycle.dtypes)
datetime datetime64[ns]
season int64
holiday int64
workingday int64
weather int64
temp float64
atemp float64
humidity int64
windspeed float64
casual int64
registered int64
rent_count float64
dtype: object
Cloning fastai package into PyCharm and installing
pip install fastai
Installation froze
27.07.21 50 min
Lessons learnt: Encoding is important
Retry in installation in "Predicting Bike Rental Demand" - venv
Make sure PyTorch is installed
Installing "torch" in
pip install torch
Error:
Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\python.exe'.
Does pip need to be installed outside venv?
Retrying pip install torch
in Windows PowerShell
PS C:\Users\henri> pip install torch
Collecting torch
Downloading torch-1.9.0-cp39-cp39-win_amd64.whl (222.0 MB)
|████████████████████████████████| 222.0 MB 15 kB/s
Collecting typing-extensions
Downloading typing_extensions-3.10.0.0-py3-none-any.whl (26 kB)
Installing collected packages: typing-extensions, torch
Successfully installed torch-1.9.0 typing-extensions-3.10.0.0
WARNING: You are using pip version 21.1.1; however, version 21.2.1 is available.
You should consider upgrading via the 'c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.
PS C:\Users\henri> c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip
Requirement already satisfied: pip in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (21.1.1)
Collecting pip
Downloading pip-21.2.1-py3-none-any.whl (1.6 MB)
|████████████████████████████████| 1.6 MB 1.7 MB/s
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.1.1
Uninstalling pip-21.1.1:
Successfully uninstalled pip-21.1.1
Successfully installed pip-21.2.1
Worked
Installing torch in venv still fails
You should consider upgrading via the 'C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\python.exe -m pip install --upgrade pip' command.
Successfully uninstalled pip-21.1.1
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
Successfully installed pip-21.2.1
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
pip uninstall -ip
Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\python.exe'.
29.07.21 25 min
Assigning written characters to specific bit sequences of bit patterns
https://en.wikipedia.org/wiki/Character_encoding
Windows 10 uses UTF-8 by default. https://techcommunity.microsoft.com/t5/windows-10/windows-10-1903-how-to-change-default-encoding-utf-8-to-ansi-in/m-p/991268
Python's default encoding is ASCII. So python.exe is encoded in ASCII? Either switch encoding to ASCII in PyCharm settings or add line # -- coding: utf-8 -- into the file.
If you read the error carefully enough, PyCharm tells you everything you need to solve this problem:
SyntaxError: Non-UTF-8 code starting with '\x90' in file C:\Users\pli\AppData\Local\Programs\Python\Python35-32\python.exe on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Now, as stated in PEP 0263:
Python will default to ASCII as standard encoding if no other encoding hints are given ... To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file.
Have you tried sticking a magic comment dictating the source code like:
-- coding: utf-8 --
https://stackoverflow.com/questions/35668967/syntaxerror-in-python-exe-on-pycharm-5-0-4
Lossy Encoding
This inspection warns you of characters that the current document encoding is incapable to represent. For example, when you are
- typing international characters in a document configured to US-ASCII charset. Some characters will be lost on save.
- or loading UTF-8-encoded file using ISO-8859-1 one-byte charset. Some characters will be displayed incorrectly. You fix this by changing the file encoding, either by specifying the encoding directly in the file, e.g. by editing encoding= attribute in the XML prolog of XML file, or configuring the Settings|Project Settings|File Encodings .
03.08.21 30 min
Non-ASCII characters
Reports code that uses non-ASCII symbols in suspicious context. For example:
- Non-ASCII characters in identifiers, strings, or comments
- Identifiers written in different languages, such as myCollection with letter C written in Cyrillic.
- Comments or strings containing Unicode symbols, such as long dashes and arrows
05.08.21 30 min
Issue submitted to JetBrains support:
Dear JetBrains support team,
I tried to install the package "torch", which produced the following error:
Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\python.exe'.
When I checked the file "python.exe", it turned out to have been falsely encoded in UTF-8, which Windows uses by default. This is what the content of the file currently looks like:
MZ� �� � @ � � �!�L�!This program cannot be run in DOS mode.
$ )�@nG�@nG�@nG�D�JnG�B��nG�C�RnG�KB�enG�KC�SnG�KD�RnG�F�GnG�@nF�&nG��O�AnG����AnG��E�AnG�Rich@nG� PE L �r�^ � : � ~0 P @ @ � @� |� P � �; R H \$ �� T � @ P � .text �9 : `.rdata <� P � > @ @.data ��
� @ �.rsrc �; � < � @ @.reloc \$ & , @ B ���F �����������U�����uj RQ������p�0��� ����]��������������U�����uj RQ�����p�0��� ����]��������������U�����uj R�5�E �y����p�0�� ����]���������U�����EPj �u�u�I����p�0�b� ����]���������U�����uj �u�uRQ�����p�0��� �� ��y�����]�U�����uj �uj�h Q������p�0�� �� ��y�����]��������������U�����EPj �u�u�u�u�����p�0�l� �� ��y�����]������������V���f��t#�ȍI ��P� ����t�F����f��u��^���������������U������E ��t�MQj �uP�"����p�0�A� ����]��j h Rh Qj h �dPD �����U�� �S � E 3ʼnE�VW�EP�uQ������������}������uFQ�hPD ����������������x+������Ph��D � +�j�P�������pP��������������Ph�D j�� ��P�������W�� �������������U��Q�PPD SVh �ىE�h��F S�Ћ��� r,�u WQ���0Turning to "Inspections" I found the following warnings:
Lossy Encoding
This inspection warns you of characters that the current document encoding is incapable to represent. For example, when you are
- typing international characters in a document configured to US-ASCII charset. Some characters will be lost on save.
or loading UTF-8-encoded file using ISO-8859-1 one-byte charset. Some characters will be displayed incorrectly. You fix this by changing the file encoding, either by specifying the encoding directly in the file, e.g. by editing encoding= attribute in the XML prolog of XML file, or configuring the Settings|Project Settings|File Encodings .
Non-ASCII characters
Reports code that uses non-ASCII symbols in suspicious context. For example:
Non-ASCII characters in identifiers, strings, or comments Identifiers written in different languages, such as myCollection with letter C written in Cyrillic. Comments or strings containing Unicode symbols, such as long dashes and arrows
So "python.exe" can't be run on DOS, contains Non.ASCII characters and probably needs to be reloaded in the correct encoding. However, every encoding to choose from contains a warning or an error. I understand that reloading in another encoding could change or damage the file. So how do I figure out which encoding to choose?
Kind regards
Henrike Schwenn
11.08.21 60 min
Antonina Belianskaya (IntelliJ)
Aug 6, 2021, 14:38 GMT+2
Hi Henrike, Thank you for contacting PyCharm support.
A package installation failure in most cases is not IDE-related behavior. It is more about a venv/python/package-specific. It can be tested, by installing the package in the same venv OUT of PyCharm in cmd/terminal. Here is a guide on how to perform this https://intellij-support.jetbrains.com/hc/en-us/articles/360010202240
Open the command line and run:
C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat pip install torch
What will be the result?
The "python.exe" is hardly related here, it looks the same on my machine and does not affect the package installation process.
Kind regards, Tonya https://www.jetbrains.com The Drive to Develop
Windows PowerShell
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
Lernen Sie das neue plattformübergreifende PowerShell kennen – https://aka.ms/pscore6
PS C:\Users\henri> cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat
cd : Der Pfad "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat" kann nicht gefunden
werden, da er nicht vorhanden ist.
In Zeile:1 Zeichen:1
+ cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_r ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (C:\Users\henri\...ts\activate.bat:String) [Set-Location], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.SetLocationCommand
PS C:\Users\henri> chdir C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat
chdir : Der Pfad "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat" kann nicht gefunden
werden, da er nicht vorhanden ist.
In Zeile:1 Zeichen:1
+ chdir C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bik ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ObjectNotFound: (C:\Users\henri\...ts\activate.bat:String) [Set-Location], ItemNotFoundException
+ FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.SetLocationCommand
PS C:\Users\henri> cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> ls
Verzeichnis: C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---l 04.05.2021 09:07 2350 activate
-a---l 04.05.2021 09:07 1027 activate.bat
-a---l 04.05.2021 09:07 18454 Activate.ps1
-a---l 04.05.2021 09:07 368 deactivate.bat
-a---l 04.05.2021 14:17 97199 f2py.exe
-a---l 27.07.2021 18:27 97204 pip.exe
-a---l 27.07.2021 18:27 97204 pip3.8.exe
-a---l 27.07.2021 18:27 97204 pip3.exe
-a---l 04.05.2021 09:06 420936 python.exe
-a---l 04.05.2021 09:06 419912 pythonw.exe
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
Requirement already satisfied: torch in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (1.9.0)
Requirement already satisfied: typing-extensions in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (from torch) (3.10.0.0)
WARNING: You are using pip version 21.2.1; however, version 21.2.3 is available.
You should consider upgrading via the 'c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip
Requirement already satisfied: pip in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (21.2.1)
Collecting pip
Downloading pip-21.2.3-py3-none-any.whl (1.6 MB)
|████████████████████████████████| 1.6 MB 2.2 MB/s
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.2.1
Uninstalling pip-21.2.1:
Successfully uninstalled pip-21.2.1
Successfully installed pip-21.2.3
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
Requirement already satisfied: torch in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (1.9.0)
Requirement already satisfied: typing-extensions in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (from torch) (3.10.0.0)
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts>
torch seems to have been installed successfully.
Try to import torch into TrainingSet.py
Still doesnt work: "No module named 'torch'"
Try 'pip install fastai' in Powershell
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install fastai
Successfully installed MarkupSafe-2.0.1 blis-0.7.4 catalogue-2.0.4 certifi-2021.5.30 charset-normalizer-2.0.4 click-7.1.2 colorama-0.4.4 cycler-0.10.0 cymem-2.0.5 fastai-2.5.1 fastcore-1.3.26 fastdownload-0.0.5 fastprogress-1.0.0 idna-3.2 jinja2-3.0.1 joblib-1.0.1 kiwisolver-1.3.1 matplotlib-3.4.2 murmurhash-1.0.5 numpy-1.21.1 packaging-21.0 pandas-1.3.1 pathy-0.6.0 pillow-8.3.1 preshed-3.0.5 pydantic-1.8.2 pyparsing-2.4.7 python-dateutil-2.8.2 pytz-2021.1 pyyaml-5.4.1 requests-2.26.0 scikit-learn-0.24.2 scipy-1.7.1 six-1.16.0 smart-open-5.1.0 spacy-3.1.1 spacy-legacy-3.0.8 srsly-2.4.1 thinc-8.0.8 threadpoolctl-2.2.0 torchvision-0.10.0 tqdm-4.62.0 typer-0.3.2 urllib3-1.26.6 wasabi-0.8.2
'fastai' successfully installed via Powershell
Follow the instructions: PyCharm can't install/import a package/library/module
Troubleshooting:
Try installing/importing a package from the system terminal (outside of PyCharm) using the same interpreter/environment.
Understanding results:
If it fails with the same error as in PyCharm - the problem is most likely not related to PyCharm. Search the web for similar problems and possible solutions (StackOverflow, python forums, etc.). It is likely to be related to pip, your environment or some compatibility issue.
If it is installed/executed successfully - just to be sure, check one more time that you are using the same environment/interpreter and if so, file an issue to PyCharm issue tracker providing the information as described in the paragraph below.
(venv) C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts> pip install torch
Fatal error in launcher: Unable to create process using '"c:\users\henri\onedrive\dokumente\berufseinstieg\sprachtechnologie\prediciting_bike_rental_demand\venv\scripts\python.exe" "C:\U
sers\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts\pip.exe" install torch': Das System kann die angegebene Datei nicht finden.
Compare paths
Powershell: PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
PyCharm: (venv) C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts>
Difference: Folder Sprachtechnologie
Try C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch in PyCharm:
(venv) C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
Collecting torch
Using cached torch-0.1.2.post2.tar.gz (128 kB)
Requirement already satisfied: pyyaml in c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages (from torch) (5.4.1)
WARNING: The candidate selected for download or install is a yanked version: 'torch' candidate (version 0.1.2.post2 at https://files.pythonhosted.org/packages/f8/02/880b468bd382dc79896eae
cbeb8ce95e9c4b99a24902874a2cef0b562cea/torch-0.1.2.post2.tar.gz#sha256=a43e37f8f927c5b18f80cd163daaf6a1920edafcab5102e02e3e14bb97d9c874 (from https://pypi.org/simple/torch/))
Reason for being yanked: 0.1.2 is past it's support date and confuses users on unsupported platforms
Using legacy 'setup.py install' for torch, since package 'wheel' is not installed.
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
Installing collected packages: torch
Running setup.py install for torch ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0]
= '"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"'; __file__='"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-insta
ll-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools
import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\henri\AppData\Local\T
emp\pip-record-_9lo0nt8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand
-master\venv\include\site\python3.8\torch'
cwd: C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\
Complete output (23 lines):
running install
running build_deps
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 225, in <module>
setup(name="torch", version="0.1.2.post2",
File "c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages\setuptools\__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 966, in run_commands
self.run_command(cmd)
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 99, in run
self.run_command('build_deps')
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 51, in run
from tools.nnwrap import generate_wrappers as generate_nn_wrappers
ModuleNotFoundError: No module named 'tools.nnwrap'
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe' -u -c 'import io, os, sys, s
etuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"'; __file__='"'"'C:\\Users\\henri
\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else
io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --recor
d 'C:\Users\henri\AppData\Local\Temp\pip-record-_9lo0nt8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\henri\onedrive\dokumente\berufseinst
ieg\predicting_bike_rental_demand-master\venv\include\site\python3.8\torch' Check the logs for full command output.
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: You are using pip version 21.2.1; however, version 21.2.3 is available.
You should consider upgrading via the 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe -m pip install --upgrade pip' command.
Well, that's some improvement.
16.08.21 75 min
Lessons learned:
Retry installing torch via powershell C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts>
Windows PowerShell
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
Lernen Sie das neue plattformübergreifende PowerShell kennen – https://aka.ms/pscore6
PS C:\Users\henri> cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts> pip install torch
Requirement already satisfied: torch in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (1.9.0)
Requirement already satisfied: typing-extensions in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (from torch) (3.10.0.0)
Retry pip install torch
in PyCharm
(venv) C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts> pip install torch
Fatal error in launcher: Unable to create process using '"c:\users\henri\onedrive\dokumente\berufseinstieg\sprachtechnologie\prediciting_bike_rental_demand\venv\scripts\python.exe" "C:\U
sers\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts\pip.exe" install torch': Das System kann die angegebene Datei nicht finden.
Just noticed that I was to compare the interpreters, not the directories.
So PyCharm using an older Python interpreter. This is probably the root of the problem.
Changed interpreter via File|Settings:
Draft for JetBrains Issue Tracker
Tried to install torch in PyCharm: PS C:\Users\henri> cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts pip install torch
Error:
Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\python.exe'.
Turned to support: Advised to run C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat pip install torch
in Windows PowerShell. Try to install torch in the same directory but outside PyCharm.
Ran C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
because activate.bat is a file, so C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat
isn't a directory I can move to.
Result: torch installed successfully, upgraded pip
Requirement already satisfied: torch in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (1.9.0)
Requirement already satisfied: typing-extensions in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (from torch) (3.10.0.0)
WARNING: You are using pip version 21.2.1; however, version 21.2.3 is available.
You should consider upgrading via the 'c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip
Requirement already satisfied: pip in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (21.2.1)
Collecting pip
Downloading pip-21.2.3-py3-none-any.whl (1.6 MB)
|████████████████████████████████| 1.6 MB 2.2 MB/s
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.2.1
Uninstalling pip-21.2.1:
Successfully uninstalled pip-21.2.1
Successfully installed pip-21.2.3
Retried pip install torch
in PyCharm, which produced the following error message:
(venv) C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
Collecting torch
Using cached torch-0.1.2.post2.tar.gz (128 kB)
Requirement already satisfied: pyyaml in c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages (from torch) (5.4.1)
WARNING: The candidate selected for download or install is a yanked version: 'torch' candidate (version 0.1.2.post2 at https://files.pythonhosted.org/packages/f8/02/880b468bd382dc79896eae
cbeb8ce95e9c4b99a24902874a2cef0b562cea/torch-0.1.2.post2.tar.gz#sha256=a43e37f8f927c5b18f80cd163daaf6a1920edafcab5102e02e3e14bb97d9c874 (from https://pypi.org/simple/torch/))
Reason for being yanked: 0.1.2 is past it's support date and confuses users on unsupported platforms
Using legacy 'setup.py install' for torch, since package 'wheel' is not installed.
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
Installing collected packages: torch
Running setup.py install for torch ... error
ERROR: Command errored out with exit status 1:
command: 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0]
= '"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"'; __file__='"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-insta
ll-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools
import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\henri\AppData\Local\T
emp\pip-record-_9lo0nt8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand
-master\venv\include\site\python3.8\torch'
cwd: C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\
Complete output (23 lines):
running install
running build_deps
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 225, in <module>
setup(name="torch", version="0.1.2.post2",
File "c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages\setuptools\__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 966, in run_commands
self.run_command(cmd)
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 99, in run
self.run_command('build_deps')
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 51, in run
from tools.nnwrap import generate_wrappers as generate_nn_wrappers
ModuleNotFoundError: No module named 'tools.nnwrap'
----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe' -u -c 'import io, os, sys, s
etuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"'; __file__='"'"'C:\\Users\\henri
\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else
io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --recor
d 'C:\Users\henri\AppData\Local\Temp\pip-record-_9lo0nt8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\henri\onedrive\dokumente\berufseinst
ieg\predicting_bike_rental_demand-master\venv\include\site\python3.8\torch' Check the logs for full command output.
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: You are using pip version 21.2.1; however, version 21.2.3 is available.
You should consider upgrading via the 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe -m pip install --upgrade pip' command.
Followed the instructions of support article: PyCharm can't install/import a package/library/module
Troubleshooting: Try installing/importing a package from the system terminal (outside of PyCharm) using the same interpreter/environment. Understanding results: If it fails with the same error as in PyCharm - the problem is most likely not related to PyCharm. Search the web for similar problems and possible solutions (StackOverflow, python forums, etc.). It is likely to be related to pip, your environment or some compatibility issue. If it is installed/executed successfully - just to be sure, check one more time that you are using the same environment/interpreter and if so, file an issue to PyCharm issue tracker providing the information as described in the paragraph below.
So torch has been installed successfully from the system terminal using the same interpreter/environment.
- Checked one more time that I was using the same environment/interpreter.
Please provide any additional information below. Attach a code sample, a screenshot, or a screencast if possible. Please attach the IDE logs. You can get them by selecting "Collect Logs and Diagnostic Data" from the "Help" menu.
07.09.21 50 min
def add_datepart(df, fldname, drop=True):
fld = df[fldname]
if not np.issubdtype(fld.dtype, np.datetime64):
df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True)
targ_pre = re.sub('[Dd]ate$', '', fldname)
for n in ('Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start'):
df[targ_pre+n] = getattr(fld.dt,n.lower())
df[targ_pre+'Elapsed'] = fld.astype(np.int64) // 10**9
if drop: df.drop(fldname, axis=1, inplace=True)
Understanding the code
def add_datepart(df, fldname, drop=True):
Parameters are the dataframe, the name of the field containing the datetime you want to split, and the option to drop said field. After you've split the datetime field into several numerical fields, you no longer need it.
fld = df[fldname]
An abbreviation for the name of the field to ease the typing of the code
if not np.issubdtype(fld.dtype, np.datetime64):
df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True)
If thevalues aren't formatted as datime, convert them to datetime.
targ_pre = re.sub('[Dd]ate$', '', fldname)
Use the re.sub function to replace the values of the datetime field with a string and returns a list.
for n in ('Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear',
'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start'):
df[targ_pre+n] = getattr(fld.dt,n.lower())
Extract these information (year, month, week, etc.) from the datetime field using targ_pre
df[targ_pre+'Elapsed'] = fld.astype(np.int64) // 10**9
?
if drop: df.drop(fldname, axis=1, inplace=True)
How to drop the datetime field
Try
trainingSetFirstCycle=add_datepart(trainingSetFirstCycle, datetime, drop=True)
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/TrainingSet.py", line 15, in <module>
trainingSetFirstCycle=add_datepart(trainingSetFirstCycle, datetime, drop=True)
NameError: name 'add_datepart' is not defined
Process finished with exit code -1
Suggested solution: from fastai.tabular.core import add_datepart
Works
08.09.21 25 min
Try
trainingSetFirstCycle=add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.datetime, drop=True)
Result
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',\n
Researching the error
Comparing the documentation of "add_datepart" -> Values of "datetime" need to be listed in the definition of the dataframe!
15.09.21 50 min
https://cmdlinetips.com/2018/11/how-to-split-a-text-column-in-pandas/
Doesn't work here because "datetime" doesn't contain strings.
trainingSetFirstCycle[['date','time']]=trainingSetFirstCycle.datetime.str.split(" ", expand=True,)
AttributeError: Can only use .str accessor with string values!
https://stackoverflow.com/questions/53653980/split-datetime-in-pandas
Suggestion to convert the column to a datetime dtype with pd.to_datetime instead.
Ran trainingSetFirstCycle.datetime=pandas.to_datetime(trainingSetFirstCycle.datetime)
, still datetime64[ns]. But let's try if this line has any effect on "add_datepart."
It has not.
trainingSetFirstCycle=({'datetime': pandas.date_range('2011-01-01 00:00:00', periods=5)})
trainingSetFirstCycle['date'] = [d.date() for d in trainingSetFirstCycle['datetime']]
trainingSetFirstCycle['time'] = [d.time() for d in trainingSetFirstCycle['datetime']]
AttributeError: 'dict' object has no attribute 'datetime'
Try trainingSetFirstCycle.assign(date=trainingSetFirstCycle.datetime.dt.date, time=trainingSetFirstCycle.datetime.dt.time)
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',\n '2011-01-01 02:00:00', '2011-01-01 03:00:00',\n '2011-01-01 04:00:00', '2011-01-01 05:00:00',\n '2011-01-01 06:00:00', '2011-01-01 07:00:00',\n '2011-01-01 08:00:00', '2011-01-01 09:00:00',\n ...\n '2012-12-19 14:00:00', '2012-12-19 15:00:00',\n '2012-12-19 16:00:00', '2012-12-19 17:00:00',\n '2012-12-19 18:00:00', '2012-12-19 19:00:00',\n '2012-12-19 20:00:00', '2012-12-19 21:00:00',\n '2012-12-19 22:00:00', '2012-12-19 23:00:00'],\n dtype='datetime64[ns]', length=10886, freq=None)] are in the [columns]"
Back to key error. Needs to be fixed before performing any operations on "datetime".
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
low_memory=False, parse_dates=["datetime"],
{'datetime': ['2011-01-01 00:00:00', '2011-01-01 01:00:00', '2011-01-01 02:00:00', '2011-01-01 03:00:00']})
SyntaxError: positional argument follows keyword argument
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
{'datetime': ['2011-01-01 00:00:00', '2011-01-01 01:00:00', '2011-01-01 02:00:00', '2011-01-01 03:00:00']},
low_memory=False, parse_dates=["datetime"])
AttributeError: 'dict' object has no attribute 'encode'
parse_dates=["datetime"]
- It might the encoding referred totrainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
{'datetime': ['2011-01-01 00:00:00', '2011-01-01 01:00:00', '2011-01-01 02:00:00', '2011-01-01 03:00:00']},
low_memory=False)
`
AttributeError: 'dict' object has no attribute 'encode'`
Try
index=datetime
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
index='datetime', low_memory=False, parse_dates=['datetime'])
TypeError: read_csv() got an unexpected keyword argument 'index'
parse_dates bool or list of int or names or list of lists or dict, default False
The behavior is as follows: boolean. If True -> try parsing the index. list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’ If a column or index cannot be represented as an array of datetimes, say because of an unparsable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more. Note: A fast-path exists for iso8601-formatted dates.
parse_dates=['datetime']
from dataframe definitiondatetime object
trainingSetFirstCycle.datetime.to_string()
index_col=['datetime']
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
index_col=['datetime'], low_memory=False, parse_dates=['datetime'])
Result: "Datetime" is turned into index and is no longer a colum of its own.
trainingSetFirstCycle.dtypes
season int64
holiday int64
workingday int64
weather int64
temp float64
atemp float64
humidity int64
windspeed float64
casual int64
registered int64
rent_count float64
dtype: object
[x] Check the original csv file:
Is "datetime" a single column to begin with?
It is :(
[x] infer_datetime_format
infer_datetime_formatbool, default False
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
low_memory=False, parse_dates=['datetime'], infer_datetime_format=True)
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
[ ] keep_date_col
[ ] date_parser
date_parserfunction, optional
Function to use for converting a sequence of string columns to an array of datetime instances. The default uses dateutil.parser.parser to do the conversion. Pandas will try to call date_parser in three different ways, advancing to the next if an exception occurs: 1) Pass one or more arrays (as defined by parse_dates) as arguments; 2) concatenate (row-wise) the string values from the columns defined by parse_dates into a single array and pass that; and 3) call date_parser once for each row using one or more strings (corresponding to the columns defined by parse_dates) as arguments.
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
low_memory=False, parse_dates=["date"])
ValueError: Missing column provided to 'parse_dates': 'date'
See if the column causes trouble without 'parse_dates': 'date'
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
low_memory=False)
trainingSetFirstCycle.rent_count = numpy.log(trainingSetFirstCycle.rent_count)
AttributeError: 'DataFrame' object has no attribute 'rent_count'
Take another look at the csv Looks okay.
Just run definition of df
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
low_memory=False)
Looks just perfect
trainingSetFirstCycle.dtypes
date;time;season;holiday;workingday;weather;temp;atemp;humidity;windspeed;casual;registered;rent_count object
dtype: object
But it's not. Apparently, Python has parsed all columns as one.
trainingSetFirstCycle.dtypes
date object
time object
season int64
holiday int64
workingday int64
weather int64
temp float64
atemp float64
humidity int64
windspeed float64
casual int64
registered int64
rent_count int64
dtype: object
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
low_memory=False, parse_dates=["date"])
trainingSetFirstCycle.dtypes
date datetime64[ns]
trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.date, drop=True)
KeyError: "None of [DatetimeIndex(['2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01',\n .....dtype='datetime64[ns]', length=10886, freq=None)] are in the [columns]"
Hey everyone,
I'm trying to split a date column in a pandas data frame using add_datepart( ).
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
low_memory=False, parse_dates=["date"])
trainingSetFirstCycle.rent_count = numpy.log(trainingSetFirstCycle.rent_count)
trainingSetFirstCycle
date time season ... casual registered rent_count
0 2011-01-01 00:00:00 1 ... 3 13 2.772589
1 2011-01-01 01:00:00 1 ... 8 32 3.688879
2 2011-01-01 02:00:00 1 ... 5 27 3.465736
3 2011-01-01 03:00:00 1 ... 3 10 2.564949
4 2011-01-01 04:00:00 1 ... 0 1 0.000000
... ... ... ... ... ... ...
10881 2012-12-19 19:00:00 4 ... 7 329 5.817111
10882 2012-12-19 20:00:00 4 ... 10 231 5.484797
10883 2012-12-19 21:00:00 4 ... 4 164 5.123964
10884 2012-12-19 22:00:00 4 ... 12 117 4.859812
10885 2012-12-19 23:00:00 4 ... 4 84 4.477337
[10886 rows x 13 columns]
trainingSetFirstCycle.dtypes
date datetime64[ns]
However, running trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.date, drop=True)
returns this error message:
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',\n ... dtype='datetime64[ns]', length=10886, freq=None)] are in the [columns]"
I checked the documentation to see what I'd done wrong.
In the example shown, the definition of the data frame includes a dictionary consisting of the column name "date" and a list containing its first four values. So I reproduced this in my own data frame:
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
{'date': ['2011-01-01', '2011-01-01 ', '2011-01-01', '2011-01-01']}, low_memory=False, parse_dates=["date"])
The result was this error message:
AttributeError: 'dict' object has no attribute 'encode'
.
So, do you have an idea what it is I'm missing here? Thanks in advance.
Comment StackOverflow
add_datepart wants the name of the column, not its values – Paul H Oct 26 at 15:37
-> Conclusion: Function doesn't seem to realize trainingSetFirstCycle.date
is the column's name.
I had assumed the column should be referred to this way because of the function numpy.log(trainingSetFirstCycle.rent_count)
-> I'd supposed whenever you give a data frame column as a parameter to a function, you have to refer to it as df.column
-> Tried several variations
trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, "date", drop=True)
worked
This has been a semantic bug. Python failed to compile trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.date, drop=True)
because it didn't realize that trainingSetFirstCycle.date
was supposed to be the column's name. I changed the parameter to simply "date"
, which has solved the problem.
Apparently, my misconception was that whenever you give a data frame column as a parameter to a function, you have to refer to it as df.column
, because that was the syntax of the function numpy.log(trainingSetFirstCycle.rent_count)
, where the data frame column "rent_count", is the parameter of the function numpy.log
.
Idea: Midnight as 0 + Number of minutes (int)
trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
low_memory=False, parse_dates=["date", "time"])
trainingSetFirstCycle.dtypes
date datetime64[ns]
time datetime64[ns]
Try
trainingSetFirstCycle.time = pandas.Timestamp(trainingSetFirstCycle.time, unit='s')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "pandas\_libs\tslibs\timestamps.pyx", line 1332, in pandas._libs.tslibs.timestamps.Timestamp.__new__
File "pandas\_libs\tslibs\conversion.pyx", line 445, in pandas._libs.tslibs.conversion.convert_to_tsobject
TypeError: Cannot convert input [0 2021-11-08 00:00:00
1 2021-11-08 01:00:00
2 2021-11-08 02:00:00
3 2021-11-08 03:00:00
4 2021-11-08 04:00:00
...
10881 2021-11-08 19:00:00
10882 2021-11-08 20:00:00
10883 2021-11-08 21:00:00
10884 2021-11-08 22:00:00
10885 2021-11-08 23:00:00
Name: time, Length: 10886, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp
Is it because I turned in the name of a column or because the column doesn't contain dates?
pandas.Timestamp(1513393355.5, unit='s')
Timestamp('2017-12-16 03:02:35.500000')
pandas.Timestamp(52345, unit='s')
Timestamp('1970-01-01 14:32:25')
pandas.Timestamp(1,unit='s')
Timestamp('1970-01-01 00:00:01')
This is how .Timestamp (unit='s') works:
- Take a float or int
- Return date + time
- Start counting on 1970-01-01 00:00:00
Conclusion Processing date as a separate column seems to be more trouble than it's worth. Keep column "datetime" for the first cycle.
Detect missing values.
Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).
Returns DataFrame Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.
Detect missing values for an array-like object. This function takes a scalar or array-like object and indicates whether values are missing (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike). Parameters objscalar or array-like Object to check for null or missing values. Returns bool or array-like of bool For scalar input, returns a scalar boolean. For array input, returns an array of boolean indicating whether each corresponding element is missing.
Run
trainingSetFirstCycle.isnull()
season holiday ... datetimeIs_year_start datetimeElapsed
0 False False ... False False
1 False False ... False False
2 False False ... False False
3 False False ... False False
4 False False ... False False
... ... ... ... ...
10881 False False ... False False
10882 False False ... False False
10883 False False ... False False
10884 False False ... False False
10885 False False ... False False
[10886 rows x 24 columns]
You don't see every value. How do I make sure I'm not missing true values?
pandas.isna(trainingSetFirstCycle)
season holiday ... datetimeIs_year_start datetimeElapsed
0 False False ... False False
1 False False ... False False
2 False False ... False False
3 False False ... False False
4 False False ... False False
... ... ... ... ...
10881 False False ... False False
10882 False False ... False False
10883 False False ... False False
10884 False False ... False False
10885 False False ... False False
[10886 rows x 24 columns]
Next steps
Teesorten, Kommunikationsblocker, Unsinnige Zeichenfolgen "Gunpowder", "Sich an einem Fischbrötchen verschlucken", "bet0934vnar" "Darjeeling", "Unmoralische Angebote", "35fknvf" "Sencha", , "gwoenv634"
test.isnull()
Teesorten Kommunikationsblocker Unsinnige Zeichenfolgen
0 False False False
1 False False False
2 False False False
By default, isnull() will not treat an empty string as missing value.
Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).
pd.options.mode.use_inf_as_na = True
test.isnull()
Teesorten Kommunikationsblocker Unsinnige Zeichenfolgen
0 False False False
1 False False False
2 False False False
numpy.Inf
IEEE 754 floating point representation of (positive) infinity. Use inf because Inf, Infinity, PINF and infty are aliases for inf. For more details, see inf.
Inf is short for infinity.
Rerun on actual df.
pandas.options.mode.use_inf_as_na = True
trainingSetFirstCycle.isnull()
season holiday ... datetimeIs_year_start datetimeElapsed
0 False False ... False False
1 False False ... False False
2 False False ... False False
3 False False ... False False
4 False False ... False False
... ... ... ... ...
10881 False False ... False False
10882 False False ... False False
10883 False False ... False False
10884 False False ... False False
10885 False False ... False False
[10886 rows x 24 columns]
Column 1, Column 2, Column 3 5,7,8 27,235,346 62,7, 24,,345
Two missing values
test = pd.read_csv("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/TestCsvFile2.csv")
test
Column 1 Column 2 Column 3
0 5 7.0 8.0
1 27 235.0 346.0
2 62 7.0 NaN
3 24 NaN 345.0
test.isna()
Column 1 Column 2 Column 3
0 False False False
1 False False False
2 False False True
3 False True False
Conclusion: If there are some empty values hidden in trainingSetFirstCycle, pandas.isna() won't point them out to me.
Ein interessantes kleines Thema, das während der Woche angesprochen wurde, ist in proc_dfFunktion. proc_dfFunktion macht folgendes:
Findet numerische Spalten mit fehlenden Werten, erstellt eine zusätzliche boolesche Spalte und ersetzt die fehlenden durch Mediane. Verwandeln Sie die kategorialen Objekte in ganzzahlige Codes.
1—Introduction to Random Forests
Test
testFilled = fastai.structured.proc_df(test)
Traceback (most recent call last):
File "<input>", line 1, in <module>
AttributeError: module 'fastai' has no attribute 'structured'
from fastai import structured
Traceback (most recent call last):
File "<input>", line 1, in <module>
ImportError: cannot import name 'structured' from 'fastai' (C:\Users\henri\AppData\Local\Programs\Python\Python39\lib\site-packages\fastai\__init__.py)
proc_df is contained in fastai.structured, which fails to be imported.
ImportError: cannot import name
import fastai
from fastai.tabular.core import add_datepart
from fastai import *
NameError: name 'add_datepart' is not defined
Meaning import * cant't access module.submodule.submodule
from fastai.tabular.core import add_datepart
from fastai.structured.proc_DF import *
ModuleNotFoundError: No module named 'fastai.structured'
The problem seems PyCharm doesn't find the module 'fastai.structured'. Does it work outside PyCharm?
Windows PowerShell
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
Lernen Sie das neue plattformübergreifende PowerShell kennen – https://aka.ms/pscore6
PS C:\Users\henri> python Python 3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import fastai import fastai.structured Traceback (most recent call last): File "
", line 1, in ModuleNotFoundError: No module named 'fastai.structured' - [x] Check if fastai.structured is still up to date "structured" and "proc_DF" produce no results in the fastai documentation. I think they are not used in the current version anymore. Current function according to [documentation](https://docs.fast.ai/tabular.core.html#FillMissing)
class FillMissing [source]
`FillMissing(fill_strategy=median, add_col=True, fill_vals=None)` :: TabularProc
Fill the missing values in continuous columns.
trainingSetFirstCycle = FillMissing
trainingSetFirstCycle = FillMissing(trainingSetFirstCycle)
Works!from fastcore.basics import range_of
splits = RandomSplitter()(range_of(trainingSetFirstCycle_main))
NameError: name 'RandomSplitter' is not defined
from fastai.data.transforms import RandomSplitter
splits = RandomSplitter()(range_of(trainingSetFirstCycle_main))
NameError: name 'trainingSetFirstCycle_main' is not defined
I should follow the entire integration example
Integration example
For a more in-depth explanation, see the tabular tutorial
path = untar_data(URLs.ADULT_SAMPLE) df = pd.read_csv(path/'adult.csv') df_main,df_test = df.iloc[:10000].copy(),df.iloc[10000:].copy() df_test.drop('salary', axis=1, inplace=True) df_main.head()
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race'] cont_names = ['age', 'fnlwgt', 'education-num'] procs = [Categorify, FillMissing, Normalize] splits = RandomSplitter()(range_of(df_main))
to = TabularPandas(df_main, procs, cat_names, cont_names, y_names="salary", splits=splits)
dls = to.dataloaders() dls.valid.show_batch()
What's df.iloc?
pandas.DataFrame.iloc
property DataFrame.iloc
Purely integer-location based indexing for selection by position. .iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array. Allowed inputs are: An integer, e.g. 5. A list or array of integers, e.g. [4, 3, 0]. A slice object with ints, e.g. 1:7. A boolean array. A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value. .iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).
What's TabularPandas?
class TabularPandas [source]
TabularPandas(df, procs=None, cat_names=None, cont_names=None, y_names=None, y_block=None, splits=None, do_setup=True, device=None, inplace=False, reduce_memory=True) :: Tabular
A Tabular object with transforms
This does the job
trainingFirstCycle = TabularPandas(dfFirstCycle_train, procs, cat_names, cont_names, y_names="rent_count", splits=splits)
trainingSetFirstCycle.rent_count
into a seperate feather format
dfFirstCycle_validate.rent_count.to_feather(validation_y_nameFilePath)
AttributeError: 'Series' object has no attribute 'to_feather'
y_name_validation = dfFirstCycle_validate.rent_count
y_name_validation.to_feather("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/FirstCycle/y_name_validation.ftr")
AttributeError: 'Series' object has no attribute 'to_feather'
A single column is a Pandas Series, which has no method "save to feather". Use Pandas.Series.to_csv
instead.
What I need to save training set in feather
import feather
pingInfoFilePath = "./serverpings.ftr";
dataFrame = pd.DataFrame(data=pingInfo);
dataFrame.to_feather(pingInfoFilePath);
Saving training set
dfFirstCycle_train.to_feather("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/FirstCycle/trainingFir
Only works with a pandas dataframe.
User Current Version:- 3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]
inside PyCharm