Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
979 stars 259 forks source link

cassandra dataframe is not works #101

Closed fu2re closed 5 years ago

fu2re commented 5 years ago

I have made my custom dataset using cassandra and overrinde read_csv method.

class CassandraDataSet(BTgymDataset2):
    @staticmethod
    def pandas_factory(colnames, rows, index=None):
        return pd.DataFrame(rows, columns=colnames, index=index)

    @classmethod
    def cassandra_connect(cls):
        try:
            return connection.get_connection()
        except CQLEngineException:
            return connection.register_connection(
                settings.CASSANDRA_CLUSTER_NAME,
                session=cls.cluster.connect(), default=True
            )

    def read_csv(self, data_filename=None, force_reload=False):
        session = self.cassandra_connect().session
        session.row_factory = self.pandas_factory
        session.default_fetch_size = None
        query = "SELECT {columns} from keyspace.chartdata WHERE name='{name}' ORDER BY ts ASC'".format(
            name=self.pair,
            columns='"' + '", "'.join(self.names) + '"',
            schema=self.schema
        )
        rslt = session.execute(query, timeout=None)
        current_dataframe = rslt._current_rows
        current_dataframe = current_dataframe.set_index('ts')
        self.data = current_dataframe
        data_range = pd.to_datetime(self.data.index)
        self.total_num_records = self.data.shape[0]
        self.data_range_delta = (data_range[-1] - data_range[0]).to_pytimedelta()

cassndra model:

from cassandra.cqlengine.models import Model, columns
from cassandra.cqlengine.management import sync_table
from cassandra.cluster import Cluster
from cassandra.cqlengine import connection
import settings

class ChartData(Model):
    __options__ = {
        'compaction': {'class': 'DateTieredCompactionStrategy',
                       'base_time_seconds': 3600,
                       'tombstone_compaction_interval': 86400},
        'default_time_to_live': 0
    }
    name = columns.Text(primary_key=True, partition_key=True)
    # the list of all sources somewhere. Store current source at the settings file.
    ts = columns.DateTime(primary_key=True)
    open = columns.Double()
    close = columns.Double()
    high = columns.Double()
    low = columns.Double()
    volume = columns.Double()

then I just open aac example and replace your dataset with my own. And is shows me the plot immediately. then it stuck with timeout error. No steps doing. I have figure out that the problem is inside of btgym/server.py:711 btgym/rendering/renderer.py:232 ... backtrader/cerebro.py:996

Your example code is working fine and does not stuck here. Plot is not displayed and learning works. What I have missed? Please help, I have spent a whole month, but cant resolve it by myself.

fu2re commented 5 years ago

packages you may interest in backtrader==1.9.69.122 cassandra-driver==3.16.0 matplotlib==2.0.2 numpy==1.15.4 pandas==0.23.4 scikit-learn==0.20.0 scipy==1.1.0 tensorboard==1.12.0 tensorflow==1.12.0

Kismuz commented 5 years ago

@fu2re,

Short:

  1. Have you run your env. manually before attempting distributed training setup? See this comment for details: https://github.com/Kismuz/btgym/issues/80#issuecomment-440780115

  2. Even before running environment, have you run your dataset-trial-episode cycle manually in a loop several times? - in a manner like in https://github.com/Kismuz/btgym/blob/master/examples/data_domain_api_intro.ipynb

Expanded:

when developing such a principal upgrade in functionality, it is a good practice to follow modularised testing approach. For this particular case I would recommend:

Kismuz commented 5 years ago

@fu2re, do I understand correctly that your proposed functionality is:

Pls. correct if I miss something.

fu2re commented 5 years ago

my CassandraDataSet is completely similar to your BTgymDataset / BTgymDataset2. The only defference is what I get data from cassandra insted of csv file. I cant pass even your own tests in data_test: IndexError: index -1 is out of bounds for axis 0 with size 0

image

I had the similar error with my CassandraDataSet tests.