Henrike-Schwenn / Predicting_bike_rental_demand

My first ai project as part of my take on the amazing online course "Introduction to Machine Learning for Coders" taught by Jeremy Howard. I will be contributing to the Kaggle competition "Bike Sharing Demand", aiming to predict bike rental demand depending on the weather.
3 stars 0 forks source link

Create dataframe "Training Set" #18

Closed Henrike-Schwenn closed 2 years ago

Henrike-Schwenn commented 3 years ago
Henrike-Schwenn commented 3 years ago

05.07.2021 60 min

IMPORTANT: PyCharm Run/Debug Configuration needs to match the file you wish to run!!

import sys

sys.path.append(
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/CsvDataset")
import CsvDataset.ClassCSVDataset

print("Huch!")
TrainingSet = CsvDataset.ClassCSVDataset.CsvObject(
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets",
    "train.csv", "trainingSet")
print(TrainingSet.__class__)
TrainingSet.CreateDataframe()

FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'

Need to restart PC in order make Python find new directory / file?

Henrike-Schwenn commented 3 years ago

08.07.2021 60 min

Research FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'

import os
import sys
for f in os.listdir("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets"):
    print(f)

sampleSubmission.csv
test.csv
TestCsvFile1.csv
train.csv

It does.

No difference

C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\pydevd.py" --cmd-line --multiproc --qt-support=auto --client 127.0.0.1 --port 64692 --file C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/CsvDataset/ClassCSVDataset.py
Connected to pydev debugger (build 211.7142.13)
import sys; print('Python %s on %s' % (sys.version, sys.platform))
Python 3.8.3 (tags/v3.8.3:6f8c832, May 13 2020, 22:20:19) [MSC v.1925 32 bit (Intel)] on win32

TrainingSet = CsvObject("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets",
    "train.csv", "trainingSet")

FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'

It does.

class CsvObject:

    def __init__(self, pathCsvDataset="Directory leading to a csv file", csvDataset="Csv file to be read", csvDataframe="Name of Dataframe"):
        self.pathCsvDataset = pathCsvDataset
        self.csvDataset = csvDataset
        self.csvDataframe = pandas.read_csv(self.csvDataset) == csvDataframe  #Double equal signs!!
TrainingSet = CsvObject("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets",
    "train.csv", "trainingSet")
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<string>", line 1, in <module>
  File "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/CsvDataset/ClassCSVDataset.py", line 12, in __init__
    self.csvDataframe = pandas.read_csv(self.csvDataset) == csvDataframe  #Double equal signs!!
  File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 462, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 819, in __init__
    self._engine = self._make_engine(self.engine)
  File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 1050, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 1867, in __init__
    self._open_handles(src, kwds)
  File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\parsers.py", line 1362, in _open_handles
    self.handles = get_handle(
  File "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\lib\site-packages\pandas\io\common.py", line 642, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'

Constructor contains functions pandas.read-csv. That means that Python needs to access the file assigned to "csvDataset" now, but hasn't been given the command os.chdir(self.pathCsvDataset) yet.

class CsvObject:

    def __init__(self, pathCsvDataset="Directory leading to a csv file", csvDataset="Csv file to be read", csvDataframe="Name of Dataframe"):
        self.pathCsvDataset = pathCsvDataset
        self.csvDataset = os.chdir(self.pathCsvDataset) == csvDataset
        self.csvDataframe = pandas.read_csv(self.csvDataset) == csvDataframe

Bug fixed. New bug: ValueError: Invalid file path or buffer object type: <class 'bool'>

https://github.com/dask/hdfs3/issues/122

Bug caused by Pandas. pd.read_csv

Henrike-Schwenn commented 3 years ago

19.07.21 50 min

RMSLE

Henrike-Schwenn commented 3 years ago

23.07.21 25 min

Henrike-Schwenn commented 3 years ago

**26.07.21 50 min***

Lessons learnt:

print(trainingSetFirstCycle.dtypes)
datetime      datetime64[ns]
season                 int64
holiday                int64
workingday             int64
weather                int64
temp                 float64
atemp                float64
humidity               int64
windspeed            float64
casual                 int64
registered             int64
rent_count           float64
dtype: object

Cloning fastai package into PyCharm and installing pip install fastai

Installation froze

Henrike-Schwenn commented 3 years ago

27.07.21 50 min

      Successfully uninstalled pip-21.1.1
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
Successfully installed pip-21.2.1
Henrike-Schwenn commented 3 years ago

29.07.21 25 min

What is encoding?

Assigning written characters to specific bit sequences of bit patterns

Examples

https://en.wikipedia.org/wiki/Character_encoding

Windows 10 uses UTF-8 by default. https://techcommunity.microsoft.com/t5/windows-10/windows-10-1903-how-to-change-default-encoding-utf-8-to-ansi-in/m-p/991268

Python's default encoding is ASCII. So python.exe is encoded in ASCII? Either switch encoding to ASCII in PyCharm settings or add line # -- coding: utf-8 -- into the file.

If you read the error carefully enough, PyCharm tells you everything you need to solve this problem:

SyntaxError: Non-UTF-8 code starting with '\x90' in file C:\Users\pli\AppData\Local\Programs\Python\Python35-32\python.exe on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Now, as stated in PEP 0263:

Python will default to ASCII as standard encoding if no other encoding hints are given ... To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file.

Have you tried sticking a magic comment dictating the source code like:

-- coding: utf-8 --

https://stackoverflow.com/questions/35668967/syntaxerror-in-python-exe-on-pycharm-5-0-4

Lossy Encoding

This inspection warns you of characters that the current document encoding is incapable to represent. For example, when you are

  • typing international characters in a document configured to US-ASCII charset. Some characters will be lost on save.
  • or loading UTF-8-encoded file using ISO-8859-1 one-byte charset. Some characters will be displayed incorrectly. You fix this by changing the file encoding, either by specifying the encoding directly in the file, e.g. by editing encoding= attribute in the XML prolog of XML file, or configuring the Settings|Project Settings|File Encodings .
Henrike-Schwenn commented 3 years ago

03.08.21 30 min

Non-ASCII characters

Reports code that uses non-ASCII symbols in suspicious context. For example:

  • Non-ASCII characters in identifiers, strings, or comments
  • Identifiers written in different languages, such as myCollection with letter C written in Cyrillic.
  • Comments or strings containing Unicode symbols, such as long dashes and arrows

Documentation - Encoding

Henrike-Schwenn commented 3 years ago

05.08.21 30 min

Issue submitted to JetBrains support:

Dear JetBrains support team,

I tried to install the package "torch", which produced the following error:

Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\python.exe'.

When I checked the file "python.exe", it turned out to have been falsely encoded in UTF-8, which Windows uses by default. This is what the content of the file currently looks like:

MZ�   �� � @  � � �!�L�!This program cannot be run in DOS mode.

$ )�@nG�@nG�@nG�D�JnG�B��nG�C�RnG�KB�enG�KC�SnG�KD�RnG�F�GnG�@nF�&nG��O�AnG����AnG��E�AnG�Rich@nG� PE L �r�^ �   : � ~0  P @     @  �  @�      |� P � �; R H  \$ �� T � @ P � .text �9  :  `.rdata <� P � > @ @.data �� 
� @ �.rsrc �; � < � @ @.reloc \$  & , @ B ���F �����������U�����u j RQ������p�0��� ����]��������������U�����u j RQ�����p�0��� ����]��������������U�����uj R�5�E �y����p�0�� ����]���������U�����EPj �u �u�I����p�0�b� ����]���������U�����uj �u �uRQ�����p�0��� �� ��y�����]�U�����uj �u j�h  Q������p�0�� �� ��y�����]��������������U�����EPj �u�u�u �u�����p�0�l� �� ��y�����]������������V���f��t#�ȍI ��P�  ����t�F����f��u��^���������������U������E ��t�M Qj �uP�"����p�0�A� ����]��j h  Rh  Qj h  �dPD �����U�� �S � E 3ʼnE�VW�EP�u Q������������}�� ����uFQ�hPD ����������������x+������Ph��D �  +�j�P�������pP��������������Ph�D j�� ��P������� W�� �������������U��Q�PPD SVh  �ىE�h��F S�Ћ���  r,� u WQ���0

Turning to "Inspections" I found the following warnings:

Lossy Encoding

This inspection warns you of characters that the current document encoding is incapable to represent. For example, when you are

  • typing international characters in a document configured to US-ASCII charset. Some characters will be lost on save.
  • or loading UTF-8-encoded file using ISO-8859-1 one-byte charset. Some characters will be displayed incorrectly. You fix this by changing the file encoding, either by specifying the encoding directly in the file, e.g. by editing encoding= attribute in the XML prolog of XML file, or configuring the Settings|Project Settings|File Encodings .

    Non-ASCII characters

    Reports code that uses non-ASCII symbols in suspicious context. For example:

    Non-ASCII characters in identifiers, strings, or comments
    Identifiers written in different languages, such as myCollection with letter C written in Cyrillic.
    Comments or strings containing Unicode symbols, such as long dashes and arrows

So "python.exe" can't be run on DOS, contains Non.ASCII characters and probably needs to be reloaded in the correct encoding. However, every encoding to choose from contains a warning or an error. I understand that reloading in another encoding could change or damage the file. So how do I figure out which encoding to choose?

Kind regards

Henrike Schwenn

Henrike-Schwenn commented 3 years ago

11.08.21 60 min

Antonina Belianskaya (IntelliJ)

Aug 6, 2021, 14:38 GMT+2

Hi Henrike, Thank you for contacting PyCharm support.

A package installation failure in most cases is not IDE-related behavior. It is more about a venv/python/package-specific. It can be tested, by installing the package in the same venv OUT of PyCharm in cmd/terminal. Here is a guide on how to perform this https://intellij-support.jetbrains.com/hc/en-us/articles/360010202240

Open the command line and run:

C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat
pip install torch

What will be the result?

The "python.exe" is hardly related here, it looks the same on my machine and does not affect the package installation process.

Kind regards, Tonya https://www.jetbrains.com The Drive to Develop

Windows PowerShell
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.

Lernen Sie das neue plattformübergreifende PowerShell kennen – https://aka.ms/pscore6

PS C:\Users\henri> cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat
cd : Der Pfad "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat" kann nicht gefunden
werden, da er nicht vorhanden ist.
In Zeile:1 Zeichen:1
+ cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_r ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\Users\henri\...ts\activate.bat:String) [Set-Location], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.SetLocationCommand

PS C:\Users\henri> chdir C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat
chdir : Der Pfad "C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat" kann nicht gefunden
werden, da er nicht vorhanden ist.
In Zeile:1 Zeichen:1
+ chdir C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bik ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\Users\henri\...ts\activate.bat:String) [Set-Location], ItemNotFoundException
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.SetLocationCommand

PS C:\Users\henri> cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> ls

    Verzeichnis: C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---l        04.05.2021     09:07           2350 activate
-a---l        04.05.2021     09:07           1027 activate.bat
-a---l        04.05.2021     09:07          18454 Activate.ps1
-a---l        04.05.2021     09:07            368 deactivate.bat
-a---l        04.05.2021     14:17          97199 f2py.exe
-a---l        27.07.2021     18:27          97204 pip.exe
-a---l        27.07.2021     18:27          97204 pip3.8.exe
-a---l        27.07.2021     18:27          97204 pip3.exe
-a---l        04.05.2021     09:06         420936 python.exe
-a---l        04.05.2021     09:06         419912 pythonw.exe

PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
Requirement already satisfied: torch in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (1.9.0)
Requirement already satisfied: typing-extensions in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (from torch) (3.10.0.0)
WARNING: You are using pip version 21.2.1; however, version 21.2.3 is available.
You should consider upgrading via the 'c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip
Requirement already satisfied: pip in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (21.2.1)
Collecting pip
  Downloading pip-21.2.3-py3-none-any.whl (1.6 MB)
     |████████████████████████████████| 1.6 MB 2.2 MB/s
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.2.1
    Uninstalling pip-21.2.1:
      Successfully uninstalled pip-21.2.1
Successfully installed pip-21.2.3
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
Requirement already satisfied: torch in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (1.9.0)
Requirement already satisfied: typing-extensions in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (from torch) (3.10.0.0)
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts>

torch seems to have been installed successfully.

PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install fastai

Successfully installed MarkupSafe-2.0.1 blis-0.7.4 catalogue-2.0.4 certifi-2021.5.30 charset-normalizer-2.0.4 click-7.1.2 colorama-0.4.4 cycler-0.10.0 cymem-2.0.5 fastai-2.5.1 fastcore-1.3.26 fastdownload-0.0.5 fastprogress-1.0.0 idna-3.2 jinja2-3.0.1 joblib-1.0.1 kiwisolver-1.3.1 matplotlib-3.4.2 murmurhash-1.0.5 numpy-1.21.1 packaging-21.0 pandas-1.3.1 pathy-0.6.0 pillow-8.3.1 preshed-3.0.5 pydantic-1.8.2 pyparsing-2.4.7 python-dateutil-2.8.2 pytz-2021.1 pyyaml-5.4.1 requests-2.26.0 scikit-learn-0.24.2 scipy-1.7.1 six-1.16.0 smart-open-5.1.0 spacy-3.1.1 spacy-legacy-3.0.8 srsly-2.4.1 thinc-8.0.8 threadpoolctl-2.2.0 torchvision-0.10.0 tqdm-4.62.0 typer-0.3.2 urllib3-1.26.6 wasabi-0.8.2

'fastai' successfully installed via Powershell

Follow the instructions: PyCharm can't install/import a package/library/module

Troubleshooting:

Try installing/importing a package from the system terminal (outside of PyCharm) using the same interpreter/environment.

Understanding results:

If it fails with the same error as in PyCharm - the problem is most likely not related to PyCharm. Search the web for similar problems and possible solutions (StackOverflow, python forums, etc.). It is likely to be related to pip, your environment or some compatibility issue.

If it is installed/executed successfully - just to be sure, check one more time that you are using the same environment/interpreter and if so, file an issue to PyCharm issue tracker providing the information as described in the paragraph below.

(venv) C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts> pip install torch
Fatal error in launcher: Unable to create process using '"c:\users\henri\onedrive\dokumente\berufseinstieg\sprachtechnologie\prediciting_bike_rental_demand\venv\scripts\python.exe"  "C:\U
sers\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts\pip.exe" install torch': Das System kann die angegebene Datei nicht finden.

Compare paths

(venv) C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
Collecting torch
  Using cached torch-0.1.2.post2.tar.gz (128 kB)
Requirement already satisfied: pyyaml in c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages (from torch) (5.4.1)
WARNING: The candidate selected for download or install is a yanked version: 'torch' candidate (version 0.1.2.post2 at https://files.pythonhosted.org/packages/f8/02/880b468bd382dc79896eae
cbeb8ce95e9c4b99a24902874a2cef0b562cea/torch-0.1.2.post2.tar.gz#sha256=a43e37f8f927c5b18f80cd163daaf6a1920edafcab5102e02e3e14bb97d9c874 (from https://pypi.org/simple/torch/))
Reason for being yanked: 0.1.2 is past it's support date and confuses users on unsupported platforms
Using legacy 'setup.py install' for torch, since package 'wheel' is not installed.
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
Installing collected packages: torch
    Running setup.py install for torch ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0]
= '"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"'; __file__='"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-insta
ll-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools
import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\henri\AppData\Local\T
emp\pip-record-_9lo0nt8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand
-master\venv\include\site\python3.8\torch'
         cwd: C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\
    Complete output (23 lines):
    running install
    running build_deps
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 225, in <module>
        setup(name="torch", version="0.1.2.post2",
      File "c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages\setuptools\__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\core.py", line 148, in setup
        dist.run_commands()
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 985, in run_command
        cmd_obj.run()
      File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 99, in run
        self.run_command('build_deps')
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 985, in run_command
        cmd_obj.run()
      File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 51, in run
        from tools.nnwrap import generate_wrappers as generate_nn_wrappers
    ModuleNotFoundError: No module named 'tools.nnwrap'
    ----------------------------------------
ERROR: Command errored out with exit status 1: 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe' -u -c 'import io, os, sys, s
etuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"'; __file__='"'"'C:\\Users\\henri
\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else
io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --recor
d 'C:\Users\henri\AppData\Local\Temp\pip-record-_9lo0nt8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\henri\onedrive\dokumente\berufseinst
ieg\predicting_bike_rental_demand-master\venv\include\site\python3.8\torch' Check the logs for full command output.
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
WARNING: You are using pip version 21.2.1; however, version 21.2.3 is available.
You should consider upgrading via the 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe -m pip install --upgrade pip' command.

Well, that's some improvement.

Henrike-Schwenn commented 3 years ago

16.08.21 75 min

Lessons learned:

Retry installing torch via powershell C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts>

Windows PowerShell
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.

Lernen Sie das neue plattformübergreifende PowerShell kennen – https://aka.ms/pscore6

PS C:\Users\henri> cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts
PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts> pip install torch
Requirement already satisfied: torch in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (1.9.0)
Requirement already satisfied: typing-extensions in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (from torch) (3.10.0.0)

Retry pip install torch in PyCharm

(venv) C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts> pip install torch
Fatal error in launcher: Unable to create process using '"c:\users\henri\onedrive\dokumente\berufseinstieg\sprachtechnologie\prediciting_bike_rental_demand\venv\scripts\python.exe"  "C:\U
sers\henri\OneDrive\Dokumente\Berufseinstieg\Sprachtechnologie\Predicting_Bike_Rental_Demand\venv\Scripts\pip.exe" install torch': Das System kann die angegebene Datei nicht finden.

Just noticed that I was to compare the interpreters, not the directories.

So PyCharm using an older Python interpreter. This is probably the root of the problem.

Changed interpreter via File|Settings:

grafik

Henrike-Schwenn commented 3 years ago

Draft for JetBrains Issue Tracker

Steps to Reproduce

  1. Tried to install torch in PyCharm: PS C:\Users\henri> cd C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts pip install torch

    Error:

   Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\python.exe'.
  1. Turned to support: Advised to run C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.bat pip install torch in Windows PowerShell. Try to install torch in the same directory but outside PyCharm.

  2. Ran C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch because activate.bat is a file, so C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts\activate.batisn't a directory I can move to.

  3. Result: torch installed successfully, upgraded pip Requirement already satisfied: torch in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (1.9.0) Requirement already satisfied: typing-extensions in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (from torch) (3.10.0.0) WARNING: You are using pip version 21.2.1; however, version 21.2.3 is available. You should consider upgrading via the 'c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command. PS C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> c:\users\henri\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip Requirement already satisfied: pip in c:\users\henri\appdata\local\programs\python\python39\lib\site-packages (21.2.1) Collecting pip Downloading pip-21.2.3-py3-none-any.whl (1.6 MB) |████████████████████████████████| 1.6 MB 2.2 MB/s Installing collected packages: pip Attempting uninstall: pip Found existing installation: pip 21.2.1 Uninstalling pip-21.2.1: Successfully uninstalled pip-21.2.1 Successfully installed pip-21.2.3

  4. Retried pip install torchin PyCharm, which produced the following error message:

    (venv) C:\Users\henri\OneDrive\Dokumente\Berufseinstieg\Predicting_bike_rental_demand-master\venv\Scripts> pip install torch
    WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
    WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
    Collecting torch
    Using cached torch-0.1.2.post2.tar.gz (128 kB)
    Requirement already satisfied: pyyaml in c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages (from torch) (5.4.1)
    WARNING: The candidate selected for download or install is a yanked version: 'torch' candidate (version 0.1.2.post2 at https://files.pythonhosted.org/packages/f8/02/880b468bd382dc79896eae
    cbeb8ce95e9c4b99a24902874a2cef0b562cea/torch-0.1.2.post2.tar.gz#sha256=a43e37f8f927c5b18f80cd163daaf6a1920edafcab5102e02e3e14bb97d9c874 (from https://pypi.org/simple/torch/))
    Reason for being yanked: 0.1.2 is past it's support date and confuses users on unsupported platforms
    Using legacy 'setup.py install' for torch, since package 'wheel' is not installed.
    WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
    Installing collected packages: torch
    Running setup.py install for torch ... error
    ERROR: Command errored out with exit status 1:
     command: 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe' -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0]
    = '"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"'; __file__='"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-insta
    ll-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools
    import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\henri\AppData\Local\T
    emp\pip-record-_9lo0nt8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand
    -master\venv\include\site\python3.8\torch'
         cwd: C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\
    Complete output (23 lines):
    running install
    running build_deps
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 225, in <module>
        setup(name="torch", version="0.1.2.post2",
      File "c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages\setuptools\__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\core.py", line 148, in setup
        dist.run_commands()
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 985, in run_command
        cmd_obj.run()
      File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 99, in run
        self.run_command('build_deps')
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "C:\Users\henri\AppData\Local\Programs\Python\Python38-32\lib\distutils\dist.py", line 985, in run_command
        cmd_obj.run()
      File "C:\Users\henri\AppData\Local\Temp\pip-install-k7l5d9hx\torch_982e5be03e574abea0284b2d0d08fe2f\setup.py", line 51, in run
        from tools.nnwrap import generate_wrappers as generate_nn_wrappers
    ModuleNotFoundError: No module named 'tools.nnwrap'
    ----------------------------------------
    ERROR: Command errored out with exit status 1: 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe' -u -c 'import io, os, sys, s
    etuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\henri\\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"'; __file__='"'"'C:\\Users\\henri
    \\AppData\\Local\\Temp\\pip-install-k7l5d9hx\\torch_982e5be03e574abea0284b2d0d08fe2f\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else
    io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --recor
    d 'C:\Users\henri\AppData\Local\Temp\pip-record-_9lo0nt8\install-record.txt' --single-version-externally-managed --compile --install-headers 'c:\users\henri\onedrive\dokumente\berufseinst
    ieg\predicting_bike_rental_demand-master\venv\include\site\python3.8\torch' Check the logs for full command output.
    WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
    WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
    WARNING: Ignoring invalid distribution -ip (c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\lib\site-packages)
    WARNING: You are using pip version 21.2.1; however, version 21.2.3 is available.
    You should consider upgrading via the 'c:\users\henri\onedrive\dokumente\berufseinstieg\predicting_bike_rental_demand-master\venv\scripts\python.exe -m pip install --upgrade pip' command.
  5. Followed the instructions of support article: PyCharm can't install/import a package/library/module

Troubleshooting:

Try installing/importing a package from the system terminal (outside of PyCharm) using the same interpreter/environment.

Understanding results:

If it fails with the same error as in PyCharm - the problem is most likely not related to PyCharm. Search the web for similar problems and possible solutions (StackOverflow, python forums, etc.). It is likely to be related to pip, your environment or some compatibility issue.

If it is installed/executed successfully - just to be sure, check one more time that you are using the same environment/interpreter and if so, file an issue to PyCharm issue tracker providing the information as described in the paragraph below.

So torch has been installed successfully from the system terminal using the same interpreter/environment.

  1. Checked one more time that I was using the same environment/interpreter.

Expected Result

Actual Result

Please provide any additional information below. Attach a code sample, a screenshot, or a screencast if possible. Please attach the IDE logs. You can get them by selecting "Collect Logs and Diagnostic Data" from the "Help" menu.

Henrike-Schwenn commented 3 years ago

07.09.21 50 min

fastai.add_datepart()

def add_datepart(df, fldname, drop=True): fld = df[fldname] if not np.issubdtype(fld.dtype, np.datetime64): df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True) targ_pre = re.sub('[Dd]ate$', '', fldname) for n in ('Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear', 'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start'): df[targ_pre+n] = getattr(fld.dt,n.lower()) df[targ_pre+'Elapsed'] = fld.astype(np.int64) // 10**9 if drop: df.drop(fldname, axis=1, inplace=True)

Understanding the code

def add_datepart(df, fldname, drop=True): Parameters are the dataframe, the name of the field containing the datetime you want to split, and the option to drop said field. After you've split the datetime field into several numerical fields, you no longer need it.

fld = df[fldname] An abbreviation for the name of the field to ease the typing of the code

if not np.issubdtype(fld.dtype, np.datetime64): df[fldname] = fld = pd.to_datetime(fld, infer_datetime_format=True) If thevalues aren't formatted as datime, convert them to datetime.

targ_pre = re.sub('[Dd]ate$', '', fldname) Use the re.sub function to replace the values of the datetime field with a string and returns a list.

for n in ('Year', 'Month', 'Week', 'Day', 'Dayofweek', 'Dayofyear', 'Is_month_end', 'Is_month_start', 'Is_quarter_end', 'Is_quarter_start', 'Is_year_end', 'Is_year_start'): df[targ_pre+n] = getattr(fld.dt,n.lower()) Extract these information (year, month, week, etc.) from the datetime field using targ_pre

df[targ_pre+'Elapsed'] = fld.astype(np.int64) // 10**9 ?

if drop: df.drop(fldname, axis=1, inplace=True) How to drop the datetime field

Try

trainingSetFirstCycle=add_datepart(trainingSetFirstCycle, datetime, drop=True)

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/TrainingSet.py", line 15, in <module>
    trainingSetFirstCycle=add_datepart(trainingSetFirstCycle, datetime, drop=True)
NameError: name 'add_datepart' is not defined

Process finished with exit code -1

Suggested solution: from fastai.tabular.core import add_datepart Works

Henrike-Schwenn commented 3 years ago

08.09.21 25 min

Try trainingSetFirstCycle=add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.datetime, drop=True)

Result raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',\n

Researching the error

Comparing the documentation of "add_datepart" grafik -> Values of "datetime" need to be listed in the definition of the dataframe!

Henrike-Schwenn commented 3 years ago

15.09.21 50 min

Split "datetime" into "date" and "time"

https://cmdlinetips.com/2018/11/how-to-split-a-text-column-in-pandas/

grafik

Doesn't work here because "datetime" doesn't contain strings.

trainingSetFirstCycle[['date','time']]=trainingSetFirstCycle.datetime.str.split(" ", expand=True,) AttributeError: Can only use .str accessor with string values!

https://stackoverflow.com/questions/53653980/split-datetime-in-pandas

grafik

Suggestion to convert the column to a datetime dtype with pd.to_datetime instead.

Ran trainingSetFirstCycle.datetime=pandas.to_datetime(trainingSetFirstCycle.datetime), still datetime64[ns]. But let's try if this line has any effect on "add_datepart."

It has not.

grafik

trainingSetFirstCycle=({'datetime': pandas.date_range('2011-01-01 00:00:00', periods=5)})
trainingSetFirstCycle['date'] = [d.date() for d in trainingSetFirstCycle['datetime']]
trainingSetFirstCycle['time'] = [d.time() for d in trainingSetFirstCycle['datetime']]

AttributeError: 'dict' object has no attribute 'datetime'

grafik

Try trainingSetFirstCycle.assign(date=trainingSetFirstCycle.datetime.dt.date, time=trainingSetFirstCycle.datetime.dt.time)

 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',\n               '2011-01-01 02:00:00', '2011-01-01 03:00:00',\n               '2011-01-01 04:00:00', '2011-01-01 05:00:00',\n               '2011-01-01 06:00:00', '2011-01-01 07:00:00',\n               '2011-01-01 08:00:00', '2011-01-01 09:00:00',\n               ...\n               '2012-12-19 14:00:00', '2012-12-19 15:00:00',\n               '2012-12-19 16:00:00', '2012-12-19 17:00:00',\n               '2012-12-19 18:00:00', '2012-12-19 19:00:00',\n               '2012-12-19 20:00:00', '2012-12-19 21:00:00',\n               '2012-12-19 22:00:00', '2012-12-19 23:00:00'],\n              dtype='datetime64[ns]', length=10886, freq=None)] are in the [columns]"

Back to key error. Needs to be fixed before performing any operations on "datetime".

Henrike-Schwenn commented 3 years ago

Fix key error "datetime"

1. Try


trainingSetFirstCycle = pandas.read_csv(                                                                                  
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
    low_memory=False, parse_dates=["datetime"],                                                                           
    {'datetime': ['2011-01-01 00:00:00', '2011-01-01 01:00:00', '2011-01-01 02:00:00', '2011-01-01 03:00:00']})  

SyntaxError: positional argument follows keyword argument

2. Try

trainingSetFirstCycle = pandas.read_csv(
"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv", {'datetime': ['2011-01-01 00:00:00', '2011-01-01 01:00:00', '2011-01-01 02:00:00', '2011-01-01 03:00:00']},
low_memory=False, parse_dates=["datetime"])

AttributeError: 'dict' object has no attribute 'encode'

  1. Try
trainingSetFirstCycle = pandas.read_csv(                                                                                  
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
    {'datetime': ['2011-01-01 00:00:00', '2011-01-01 01:00:00', '2011-01-01 02:00:00', '2011-01-01 03:00:00']},           
    low_memory=False) 
`
AttributeError: 'dict' object has no attribute 'encode'`
  1. Try

    • Add index=datetime
trainingSetFirstCycle = pandas.read_csv(                                                                                  
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
    index='datetime', low_memory=False, parse_dates=['datetime'])    

TypeError: read_csv() got an unexpected keyword argument 'index'

Take a closer look at parse_dates

parse_dates bool or list of int or names or list of lists or dict, default False

The behavior is as follows:

    boolean. If True -> try parsing the index.

    list of int or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.

    list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.

    dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’

If a column or index cannot be represented as an array of datetimes, say because of an unparsable value or a mixture of timezones, the column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv. To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True. See Parsing a CSV with mixed timezones for more.

Note: A fast-path exists for iso8601-formatted dates.

Idea: Splitting 'datetime' column before parsing it as date column?

Henrike-Schwenn commented 3 years ago

Index column has no header, part of "datetime"

trainingSetFirstCycle.dtypes
season          int64
holiday         int64
workingday      int64
weather         int64
temp          float64
atemp         float64
humidity        int64
windspeed     float64
casual          int64
registered      int64
rent_count    float64
dtype: object
Henrike-Schwenn commented 3 years ago

Try setting further parameters, see if it has any effect on the datetime column

infer_datetime_formatbool, default False

    If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
trainingSetFirstCycle = pandas.read_csv(
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train.csv",
   low_memory=False, parse_dates=['datetime'], infer_datetime_format=True)

raise KeyError(f"None of [{key}] are in the [{axis_name}]")

Henrike-Schwenn commented 3 years ago

Split using Excel

grafik

trainingSetFirstCycle = pandas.read_csv(
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
   low_memory=False, parse_dates=["date"])

ValueError: Missing column provided to 'parse_dates': 'date'

Henrike-Schwenn commented 3 years ago

Check ValueError: Missing column provided to 'parse_dates': 'date'

Looks just perfect

trainingSetFirstCycle.dtypes
date;time;season;holiday;workingday;weather;temp;atemp;humidity;windspeed;casual;registered;rent_count    object
dtype: object

But it's not. Apparently, Python has parsed all columns as one.

Henrike-Schwenn commented 3 years ago

Fix datatype "date"

trainingSetFirstCycle = pandas.read_csv(

"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
   low_memory=False, parse_dates=["date"])

trainingSetFirstCycle.dtypes
date          datetime64[ns]
Henrike-Schwenn commented 3 years ago

Retry add_datepart()

trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.date, drop=True)

KeyError: "None of [DatetimeIndex(['2011-01-01', '2011-01-01', '2011-01-01', '2011-01-01',\n .....dtype='datetime64[ns]', length=10886, freq=None)] are in the [columns]"
Henrike-Schwenn commented 3 years ago

Post at StackOverflow add_datepart( ) produces KeyError

Hey everyone,

I'm trying to split a date column in a pandas data frame using add_datepart( ).

trainingSetFirstCycle = pandas.read_csv(

"C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
   low_memory=False, parse_dates=["date"])
trainingSetFirstCycle.rent_count = numpy.log(trainingSetFirstCycle.rent_count)

trainingSetFirstCycle
            date      time  season  ...  casual  registered  rent_count
0     2011-01-01  00:00:00       1  ...       3          13    2.772589
1     2011-01-01  01:00:00       1  ...       8          32    3.688879
2     2011-01-01  02:00:00       1  ...       5          27    3.465736
3     2011-01-01  03:00:00       1  ...       3          10    2.564949
4     2011-01-01  04:00:00       1  ...       0           1    0.000000
          ...       ...     ...  ...     ...         ...         ...
10881 2012-12-19  19:00:00       4  ...       7         329    5.817111
10882 2012-12-19  20:00:00       4  ...      10         231    5.484797
10883 2012-12-19  21:00:00       4  ...       4         164    5.123964
10884 2012-12-19  22:00:00       4  ...      12         117    4.859812
10885 2012-12-19  23:00:00       4  ...       4          84    4.477337
[10886 rows x 13 columns]

trainingSetFirstCycle.dtypes
date          datetime64[ns]

However, running trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.date, drop=True) returns this error message:

raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',\n ... dtype='datetime64[ns]', length=10886, freq=None)] are in the [columns]"

I checked the documentation to see what I'd done wrong.

image

In the example shown, the definition of the data frame includes a dictionary consisting of the column name "date" and a list containing its first four values. So I reproduced this in my own data frame:

trainingSetFirstCycle = pandas.read_csv(
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
   {'date': ['2011-01-01', '2011-01-01 ', '2011-01-01', '2011-01-01']}, low_memory=False, parse_dates=["date"])

The result was this error message:

AttributeError: 'dict' object has no attribute 'encode'.

So, do you have an idea what it is I'm missing here? Thanks in advance.

Henrike-Schwenn commented 3 years ago

Bugfix add_datepart

Comment StackOverflow

add_datepart wants the name of the column, not its values – Paul H Oct 26 at 15:37

-> Conclusion: Function doesn't seem to realize trainingSetFirstCycle.date is the column's name. I had assumed the column should be referred to this way because of the function numpy.log(trainingSetFirstCycle.rent_count) -> I'd supposed whenever you give a data frame column as a parameter to a function, you have to refer to it as df.column -> Tried several variations trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, "date", drop=True) worked

This has been a semantic bug. Python failed to compile trainingSetFirstCycle = add_datepart(trainingSetFirstCycle, trainingSetFirstCycle.date, drop=True) because it didn't realize that trainingSetFirstCycle.date was supposed to be the column's name. I changed the parameter to simply "date", which has solved the problem.

Apparently, my misconception was that whenever you give a data frame column as a parameter to a function, you have to refer to it as df.column , because that was the syntax of the function numpy.log(trainingSetFirstCycle.rent_count), where the data frame column "rent_count", is the parameter of the function numpy.log.

Henrike-Schwenn commented 3 years ago

Turn "time" from object into a numerical format

Convert "time" into datetime format

trainingSetFirstCycle = pandas.read_csv(
    "C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/train_datetime_split.csv",
    low_memory=False, parse_dates=["date", "time"])
trainingSetFirstCycle.dtypes
date          datetime64[ns]
time          datetime64[ns]

Try

trainingSetFirstCycle.time = pandas.Timestamp(trainingSetFirstCycle.time, unit='s')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "pandas\_libs\tslibs\timestamps.pyx", line 1332, in pandas._libs.tslibs.timestamps.Timestamp.__new__
  File "pandas\_libs\tslibs\conversion.pyx", line 445, in pandas._libs.tslibs.conversion.convert_to_tsobject
TypeError: Cannot convert input [0       2021-11-08 00:00:00
1       2021-11-08 01:00:00
2       2021-11-08 02:00:00
3       2021-11-08 03:00:00
4       2021-11-08 04:00:00
                ...        
10881   2021-11-08 19:00:00
10882   2021-11-08 20:00:00
10883   2021-11-08 21:00:00
10884   2021-11-08 22:00:00
10885   2021-11-08 23:00:00
Name: time, Length: 10886, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp

Is it because I turned in the name of a column or because the column doesn't contain dates?

pandas.Timestamp(1513393355.5, unit='s')
Timestamp('2017-12-16 03:02:35.500000')
pandas.Timestamp(52345, unit='s')
Timestamp('1970-01-01 14:32:25')
pandas.Timestamp(1,unit='s')
Timestamp('1970-01-01 00:00:01')

This is how .Timestamp (unit='s') works:
- Take a float or int
- Return date + time
- Start counting on 1970-01-01 00:00:00

Conclusion Processing date as a separate column seems to be more trouble than it's worth. Keep column "datetime" for the first cycle.

Henrike-Schwenn commented 3 years ago

Detect and fill missing values

Return a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values. Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

Returns

    DataFrame

        Mask of bool values for each element in DataFrame that indicates whether an element is an NA value.

pandas.isna(obj)[source]

Detect missing values for an array-like object.

This function takes a scalar or array-like object and indicates whether values are missing (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).

Parameters

    objscalar or array-like

        Object to check for null or missing values.

Returns

    bool or array-like of bool

        For scalar input, returns a scalar boolean. For array input, returns an array of boolean indicating whether each corresponding element is missing.

Run

trainingSetFirstCycle.isnull()
       season  holiday  ...  datetimeIs_year_start  datetimeElapsed
0       False    False  ...                  False            False
1       False    False  ...                  False            False
2       False    False  ...                  False            False
3       False    False  ...                  False            False
4       False    False  ...                  False            False
       ...      ...  ...                    ...              ...
10881   False    False  ...                  False            False
10882   False    False  ...                  False            False
10883   False    False  ...                  False            False
10884   False    False  ...                  False            False
10885   False    False  ...                  False            False
[10886 rows x 24 columns]

You don't see every value. How do I make sure I'm not missing true values?

pandas.isna(trainingSetFirstCycle)
       season  holiday  ...  datetimeIs_year_start  datetimeElapsed
0       False    False  ...                  False            False
1       False    False  ...                  False            False
2       False    False  ...                  False            False
3       False    False  ...                  False            False
4       False    False  ...                  False            False
       ...      ...  ...                    ...              ...
10881   False    False  ...                  False            False
10882   False    False  ...                  False            False
10883   False    False  ...                  False            False
10884   False    False  ...                  False            False
10885   False    False  ...                  False            False
[10886 rows x 24 columns]

Next steps

Test csv with missing value

Teesorten, Kommunikationsblocker, Unsinnige Zeichenfolgen "Gunpowder", "Sich an einem Fischbrötchen verschlucken", "bet0934vnar" "Darjeeling", "Unmoralische Angebote", "35fknvf" "Sencha", , "gwoenv634"

test.isnull()
   Teesorten   Kommunikationsblocker   Unsinnige Zeichenfolgen
0      False                   False                     False
1      False                   False                     False
2      False                   False                     False

By default, isnull() will not treat an empty string as missing value.

Characters such as empty strings '' or numpy.inf are not considered NA values (unless you set pandas.options.mode.use_inf_as_na = True).

pd.options.mode.use_inf_as_na = True
test.isnull()
   Teesorten   Kommunikationsblocker   Unsinnige Zeichenfolgen
0      False                   False                     False
1      False                   False                     False
2      False                   False                     False

What is numpy.inf, anyway?

numpy.Inf

IEEE 754 floating point representation of (positive) infinity.

Use inf because Inf, Infinity, PINF and infty are aliases for inf. For more details, see inf.

Inf is short for infinity.

Rerun on actual df.

pandas.options.mode.use_inf_as_na = True
trainingSetFirstCycle.isnull()
       season  holiday  ...  datetimeIs_year_start  datetimeElapsed
0       False    False  ...                  False            False
1       False    False  ...                  False            False
2       False    False  ...                  False            False
3       False    False  ...                  False            False
4       False    False  ...                  False            False
       ...      ...  ...                    ...              ...
10881   False    False  ...                  False            False
10882   False    False  ...                  False            False
10883   False    False  ...                  False            False
10884   False    False  ...                  False            False
10885   False    False  ...                  False            False
[10886 rows x 24 columns]

Column 1, Column 2, Column 3 5,7,8 27,235,346 62,7, 24,,345

Two missing values

test = pd.read_csv("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/Datasets/TestCsvFile2.csv")
test
   Column 1   Column 2   Column 3
0         5        7.0        8.0
1        27      235.0      346.0
2        62        7.0        NaN
3        24        NaN      345.0
test.isna()
   Column 1   Column 2   Column 3
0     False      False      False
1     False      False      False
2     False      False       True
3     False       True      False

Conclusion: If there are some empty values hidden in trainingSetFirstCycle, pandas.isna() won't point them out to me.

proc_df function

Ein interessantes kleines Thema, das während der Woche angesprochen wurde, ist in proc_dfFunktion. proc_dfFunktion macht folgendes:

Findet numerische Spalten mit fehlenden Werten, erstellt eine zusätzliche boolesche Spalte und ersetzt die fehlenden durch Mediane.
Verwandeln Sie die kategorialen Objekte in ganzzahlige Codes.

grafik 1—Introduction to Random Forests

Test

testFilled = fastai.structured.proc_df(test)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: module 'fastai' has no attribute 'structured'

from fastai import structured
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ImportError: cannot import name 'structured' from 'fastai' (C:\Users\henri\AppData\Local\Programs\Python\Python39\lib\site-packages\fastai\__init__.py)

proc_df is contained in fastai.structured, which fails to be imported.

from fastai.tabular.core import add_datepart
from fastai.structured.proc_DF import *
ModuleNotFoundError: No module named 'fastai.structured'

The problem seems PyCharm doesn't find the module 'fastai.structured'. Does it work outside PyCharm?

Lernen Sie das neue plattformübergreifende PowerShell kennen – https://aka.ms/pscore6

PS C:\Users\henri> python Python 3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

import fastai import fastai.structured Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'fastai.structured'


- [x] Check if fastai.structured is still up to date
"structured" and "proc_DF" produce no results in the fastai documentation. I think they are not used in the current version anymore.
Current function according to [documentation](https://docs.fast.ai/tabular.core.html#FillMissing)

class FillMissing [source]

`FillMissing(fill_strategy=median, add_col=True, fill_vals=None)` :: TabularProc

Fill the missing values in continuous columns.

Henrike-Schwenn commented 2 years ago

Split set into training and validation sets

from fastcore.basics import range_of
splits = RandomSplitter()(range_of(trainingSetFirstCycle_main))
NameError: name 'RandomSplitter' is not defined
from fastai.data.transforms import RandomSplitter
splits = RandomSplitter()(range_of(trainingSetFirstCycle_main))
NameError: name 'trainingSetFirstCycle_main' is not defined

I should follow the entire integration example

Integration example

For a more in-depth explanation, see the tabular tutorial

path = untar_data(URLs.ADULT_SAMPLE) df = pd.read_csv(path/'adult.csv') df_main,df_test = df.iloc[:10000].copy(),df.iloc[10000:].copy() df_test.drop('salary', axis=1, inplace=True) df_main.head()

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race'] cont_names = ['age', 'fnlwgt', 'education-num'] procs = [Categorify, FillMissing, Normalize] splits = RandomSplitter()(range_of(df_main))

to = TabularPandas(df_main, procs, cat_names, cont_names, y_names="salary", splits=splits)

dls = to.dataloaders() dls.valid.show_batch()

What's df.iloc?

pandas.DataFrame.iloc

property DataFrame.iloc

Purely integer-location based indexing for selection by position.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

    An integer, e.g. 5.

    A list or array of integers, e.g. [4, 3, 0].

    A slice object with ints, e.g. 1:7.

    A boolean array.

    A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).

What's TabularPandas?

class TabularPandas [source]

TabularPandas(df, procs=None, cat_names=None, cont_names=None, y_names=None, y_block=None, splits=None, do_setup=True, device=None, inplace=False, reduce_memory=True) :: Tabular

A Tabular object with transforms

This does the job trainingFirstCycle = TabularPandas(dfFirstCycle_train, procs, cat_names, cont_names, y_names="rent_count", splits=splits)

Henrike-Schwenn commented 2 years ago

Isolate dependent variable

A single column is a Pandas Series, which has no method "save to feather". Use Pandas.Series.to_csv instead.

Henrike-Schwenn commented 2 years ago

Save trainingSet to Feather Format

Reading And Writing In Feather Format

What I need to save training set in feather

import feather
pingInfoFilePath = "./serverpings.ftr";
dataFrame   = pd.DataFrame(data=pingInfo);
dataFrame.to_feather(pingInfoFilePath);

Saving training set

dfFirstCycle_train.to_feather("C:/Users/henri/OneDrive/Dokumente/Berufseinstieg/Sprachtechnologie/Predicting_Bike_Rental_Demand/FirstCycle/trainingFir

Only works with a pandas dataframe.