Closed gevezex closed 3 years ago
I'm not sure. I'd first try finding out what the problem is by EXPLAIN
ing the query and looking where the database actually spends so much time.
When using recursive CTEs, each query has to build the whole tree from the start. It may be that 70k is a bit much for this, but I'd first look whether the problem may be fixed by using another ordering field. I'm not sure.
I am trying to create a (static) filebrowser in my app. This is my FolderTree model:
class FolderTree(TreeNode):
name = models.CharField(max_length=255)
fullpath = models.CharField(max_length=255, null=False, blank=False, unique=True)
isfile = models.BooleanField(default=False, null=False, blank=True)
hasdoc = models.BooleanField(default=False, null=False, blank=True)
document = models.ForeignKey(Document, null=True, blank=True, related_name='folder', on_delete=models.PROTECT)
file_extension = models.CharField(max_length=10, null=True, blank=True)
deleted = models.BooleanField(default=False, null=False, blank=True)
descendant_files = models.IntegerField(null=True, blank=True)
isarchived = models.BooleanField(default=False, null=True, blank=True)
archived_files = models.IntegerField(null=True, blank=True)
date = models.DateTimeField(null=True, blank=True)
class Meta:
ordering = ("name",)
unique_together = (("name", "parent"),)
indexes = [
models.Index(fields=['name', 'isarchived']),
]
def __str__(self):
return self.name
I fill this model with this class (TL;DR: what just recursively checks a root directory (and all it's subfolders) and creates objects of files and directories with some extra fields basically):
class CreateFolderTrees():
"""
This class will fill the FolderTree Model by recursively scan the
rootfolder for files and directories.
start_sequence: Will begin scanning and filling
"""
def __init__(self, rootfolder) -> None:
self.rootfolder = rootfolder
def return_relative_path(self, fpath):
p = PurePosixPath(fpath)
if str(p) == '.':
return 'root'
else:
return str(p.relative_to(self.rootfolder))
def return_extension(self, fname):
f = PurePosixPath(fname)
ext = f.suffix
ext = ext.replace('.','')
if len(ext) < 10:
return ext
else:
return ''
def folder_actions(self, dir_path, parent):
"""Will list a directory for files and dirs and will create FolderTree objects. Recusively will do the same thing when it detects
a directory.
"""
rel_dir_path = self.return_relative_path(dir_path)
dirs = []
files = []
for entry in os.scandir(dir_path):
if entry.is_dir():
dirs.append(entry)
elif entry.is_file():
files.append(entry)
for f in files:
arguments = {"name" : f.name, "isfile" : True, "parent" : parent, "date": datetime.datetime.fromtimestamp(os.stat(f).st_mtime)}
print(arguments)
f_relative = self.return_relative_path(f.path)
arguments["fullpath"] = f_relative
arguments["file_extension"] = self.return_extension(f.name)
try:
document = Document.objects.get(file_path=f_relative)
arguments['document'] = document
arguments['hasdoc'] = True
arguments["date"] = datetime.datetime.fromtimestamp(os.stat(f.path).st_mtime)
except:
pass
FolderTree(**arguments).save()
for d in dirs:
d_relative = self.return_relative_path(d.path)
f_date = datetime.datetime.fromtimestamp(os.stat(d.path).st_mtime)
par = FolderTree(name=d.name, fullpath=d_relative, parent=parent, date=f_date)
par.save()
self.folder_actions(d.path, parent=par)
def start_sequence(self):
"""This method will start the filling. Use this to start after initializing the class"""
#create root
parent = FolderTree(name='root')
parent.save()
self.folder_actions(self.rootfolder, parent=parent)
I counted around 70k objects (4400 dirs and the remaining are files) after the model has been filled. So when I want to find out the ancestors of one dir and count the ancestors:
FolderTree.objects.get(pk=1000).ancestors().count
Whit timeit I measure that this query takes about 500ms.
I meant: Trace the executed SQL statements (for example with django-debug-toolbar) and EXPLAIN them (EXPLAIN is a PostgreSQL statement which shows the execution plan etc.)
But; you say you have 4400 dirs and about 66k files. Why not separate folders and files? Is it necessary to save files as FolderTree
objects?
Maybe the performance problem just goes away when the CTE "only" has to process about 4400 folders.
Yeah maybe that separation is a good idea. I could just add files as related objects to the folders as files do not have children. Will try that, thnx.
But actually my question was more if you had experience with larger tables or that maybe I was making a stupid mistake in coding (I always suspect myself first 🤣)
I unfortunately (fortunately? 😅) do not have experience with large trees. Large tables yes, but not large trees. Most trees I work with have tens or hundreds of nodes, seldom thousands.
I am using the StringOrderedModel to store around 70k objects with max depth around 6-8. When I try to find the ancestors of an object of 2 to maybe 3 ancestors it takes about 500ms. I tried to put an index on the 'name' field but that does not have a an impact. Is this a known behaviour or is there something I could do differently?